Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question for models/trainer.py#L325 ? #114

Open
zjreno opened this issue May 19, 2021 · 3 comments
Open

Question for models/trainer.py#L325 ? #114

zjreno opened this issue May 19, 2021 · 3 comments

Comments

@zjreno
Copy link

zjreno commented May 19, 2021

In https://github.com/nlpyang/BertSum/blob/master/src/models/trainer.py#L325 ,
After sum(), the loss.numel() must be 1 , What different between (loss/loss.numel()).backward() with loss.backward() ?

So, I guess, the loss.numel() may express the n_docs ?
Can we use loss / normalization replace (loss/loss.numel()) ?

@zjreno zjreno changed the title Question for models/trainer.py#L325 , Question for models/trainer.py#L325 ? May 19, 2021
@Anothernewcomer
Copy link

Hi I have the same problem, what's your conclusion?

@haidequanbu
Copy link

Hi,I have a bug about this statement:
Traceback (most recent call last):
File "train.py", line 340, in
train(args, device_id)
File "train.py", line 272, in train
trainer.train(train_iter_fct, args.train_steps)
File "/root/code/BertSum/src/models/trainer.py", line 155, in train
self._gradient_accumulation(
File "/root/code/BertSum/src/models/trainer.py", line 326, in _gradient_accumulation
loss.div(float(normalization)).backward()
File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Does it have any relation with the statement?Or have you solv it?
Pardon me for my poor English!

@haidequanbu
Copy link

Hi,I have a bug about this statement: Traceback (most recent call last): File "train.py", line 340, in train(args, device_id) File "train.py", line 272, in train trainer.train(train_iter_fct, args.train_steps) File "/root/code/BertSum/src/models/trainer.py", line 155, in train self._gradient_accumulation( File "/root/code/BertSum/src/models/trainer.py", line 326, in _gradient_accumulation loss.div(float(normalization)).backward() File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Does it have any relation with the statement?Or have you solv it? Pardon me for my poor English!

Ok,i have already solved the problem.It is about using BCEcross before,you should give a sigmoid layer before the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants