You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I training model using this version of hugging face trainer, the loss is accumulated by sum instead of mean. Or it's better saying that the tr_loss_step did not divide by the global batch size. That means, the reported loss will scale proportionally with the global batch size.
When I change the transformer version to be 4.44.0, this problem is and everything works good.
My global batch size is set to 128 here. You can see from the above image that tr_loss_step=original_loss / 128, which is not the case when transformers version is 4.47.1, where tr_loss_step=original_loss.
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Any training with transformers==4.47.1
Expected behavior
loss accumulated by sum instead of mean
The text was updated successfully, but these errors were encountered:
Can you try with the main branch of transformers, we fixed a couple of issue grad accumulation, so your issue might be solved. Otherwise, can you share a minimal reproducer ? Thanks !
Can you try with the main branch of transformers, we fixed a couple of issue grad accumulation, so your issue might be solved. Otherwise, can you share a minimal reproducer ? Thanks !
System Info
Version:
transformers==4.47.1
When I training model using this version of hugging face trainer, the loss is accumulated by sum instead of mean. Or it's better saying that the
tr_loss_step
did not divide by the global batch size. That means, the reported loss will scale proportionally with the global batch size.When I change the transformer version to be 4.44.0, this problem is and everything works good.
My global batch size is set to 128 here. You can see from the above image that
tr_loss_step=original_loss / 128
, which is not the case when transformers version is 4.47.1, wheretr_loss_step=original_loss
.Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
transformers==4.47.1
Expected behavior
loss accumulated by sum instead of mean
The text was updated successfully, but these errors were encountered: