You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks for your work~
I have some confusions about the experiments.
Why you train three epochs?(maybe this is unfair)
you set n_query = 100, and you report just 15 steps, so it just use 1500 samples in total, why don't you report the subsequent experimental results?
For my experiment, I use same model and dataset llama2-7b,dolly-15k, and I set n_query=500, train_epoch=1.
I test three rounds(rd-3,rd-15,rd-20), but the result is not good as report in your paper.
rd=3:
Hi! Thank you very much for your questions! I hope the following explanations help clarify our methodology and findings.
Why did we train 3 epochs for each step?
In each "round" of our experiment, we fine-tune the LLaMA model from scratch, using a dataset that increases by n_query data points with every iteration. The decision to train for 3 epochs per round adheres to the Alpaca-Style hyperparameters, as detailed here: https://github.com/tatsu-lab/stanford_alpaca, ensuring consistency across all iterations.
Why didn’t we run experiments for more rounds, i.e., on more datapoints?
The primary goal of our research was to explore efficient instruction tuning with reduced training data. We found that beyond a certain point, adding more data to the training set didn’t significantly improve performance. In our initial trials, we did implement more steps, specifically 20-30, but observed only marginal performance gains, if any.
If you could refer to Figure 2 in our paper, it also shows diminishing returns on performance gain with more steps. This observation again supports our findings that a significantly smaller subset of data can be just as effective for instruction tuning as using more data. This is also in line with the conclusions of other studies as well, such as 'Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning'.
How does the choice of n_query impact the results?
We greatly appreciate your own experiments. The selection of n_query is indeed a crucial factor. In our design, considering a fixed overall subset budget, a smaller n_query paired with a larger number of iterations (n_round) allows the model to gradually improve its performance. This is because each iteration poses a manageable challenge (100) to the model, and over multiple iterations, these smaller additions accumulate to produce substantial improvements. In contrast, a larger n_query per iteration, similar to what was used in your experiment (500), could potentially overwhelm the model’s ability to optimally select data points at each step.
Our findings suggest that a lower n_query, like 100, in conjunction with more iterations, is more effective than a higher n_query with fewer iterations. For an extreme case of selecting the entire subset budget in one iteration versus our approach of 100 per round, please refer to Section 4.3 (Dynamic Iteration) in our paper.
Hello, thanks for your work~
I have some confusions about the experiments.
For my experiment, I use same model and dataset llama2-7b,dolly-15k, and I set n_query=500, train_epoch=1.
I test three rounds(rd-3,rd-15,rd-20), but the result is not good as report in your paper.
rd=3:
rd=15:
rd=20:
Could u give me an explanation or some guidance?
any reply will be appreciated
The text was updated successfully, but these errors were encountered: