-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why using meta-epoch training paradigm #18
Comments
@DonaldRR could you point me to a line of code where "the loss is backpropagated for each sub-iteration -- only part of the batch is sampled"? I don't think, I implemented anything like you described: meta/sub epochs are just to introduce flexibility for train/test phases |
Ops, I mean the FIBER iteratoin. During each fiber iteration, N(=256) features are sampled for loss computation and backpropagation. lines starting at 65th line
|
@DonaldRR I see. The number of feature vectors (fibers) can be quite large in a feature map (tensor) to fill all memory for a flow model. So, it is better to sample random feature vectors from a number of feature maps. Hence, your original post is on point :) |
@gudovskiy Thank you for providing this excellent repo. Does training on subbatches of fibers serve any purpose other than conserving memory? Could I remove this loop and process all the fibers in one shot if I have sufficient gpu memory to do so? |
@PSZehnder yes |
Hi, |
@Howie86 N should not change test results |
@gudovskiy Thanks for your reply, I understand. |
Hi,
Thanks to your code, it helps my way on researching alot.
One question comes to my mind when I try to implement cFlow under my code that usually a model is trained by a batch of data where the loss is reduced from the whole batch and backpropagated.
And I found that the loss is backpropagated for each sub-iteration -- only part of the batch is sampled.
This Training paradigm somehow confuses me, does it work better than the normal way ?
Here are points that I surmise why that works:
Thanks!
The text was updated successfully, but these errors were encountered: