Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues for RelationRNN training, maxSeqLen and zero loss or infinity loss #4

Open
Vimos opened this issue May 2, 2018 · 1 comment

Comments

@Vimos
Copy link

Vimos commented May 2, 2018

If using the default maxSeqLen, one will get the cublas runtime error

➜  RelationRNN git:(master) ✗ th train_rel_rnn.lua               
[INFO - 2018_05_02_20:11:11] - "--------------------------------------------------"
[INFO - 2018_05_02_20:11:11] - "SeqRankingLoader Configurations:"
[INFO - 2018_05_02_20:11:11] - "    number of batch : 296"
[INFO - 2018_05_02_20:11:11] - "    data batch size : 256"
[INFO - 2018_05_02_20:11:11] - "    neg sample size : 1024"
[INFO - 2018_05_02_20:11:11] - "    neg sample range: 7524"
[INFO - 2018_05_02_20:11:11] - "--------------------------------------------------"
[INFO - 2018_05_02_20:11:11] - "BiGRU Configuration:"
[INFO - 2018_05_02_20:11:11] - "    inputSize   :   300"
[INFO - 2018_05_02_20:11:11] - "    hiddenSize  :   256"
[INFO - 2018_05_02_20:11:11] - "    maxSeqLen   :    40"
[INFO - 2018_05_02_20:11:11] - "    maxBatch    :   256"
[INFO - 2018_05_02_20:11:11] - "--------------------------------------------------"
[INFO - 2018_05_02_20:11:11] - "BiGRU Configuration:"
[INFO - 2018_05_02_20:11:11] - "    inputSize   :   512"
[INFO - 2018_05_02_20:11:11] - "    hiddenSize  :   256"
[INFO - 2018_05_02_20:11:11] - "    maxSeqLen   :    40"
[INFO - 2018_05_02_20:11:11] - "    maxBatch    :   256"
/home/vimos/.torch/install/bin/luajit: /home/vimos/.torch/install/share/lua/5.1/nn/Container.lua:67: 
In 5 module of nn.Sequential:
/home/vimos/Data/git/QA/CFO/src/model/BiGRU.lua:241: cublas runtime error : an internal operation failed at /home/vimos/.torch/extra/cutorch/lib/THC/THCBlas.cu:246
stack traceback:
	[C]: in function 'mm'
	/home/vimos/Data/git/QA/CFO/src/model/BiGRU.lua:241: in function 'updateGradInput'
	/home/vimos/.torch/install/share/lua/5.1/nn/Module.lua:31: in function </home/vimos/.torch/install/share/lua/5.1/nn/Module.lua:29>
	[C]: in function 'xpcall'
	/home/vimos/.torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	/home/vimos/.torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
	train_rel_rnn.lua:174: in main chunk
	[C]: in function 'dofile'
	...mos/.torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x559ae9bad710

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
	[C]: in function 'error'
	/home/vimos/.torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
	/home/vimos/.torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
	train_rel_rnn.lua:174: in main chunk
	[C]: in function 'dofile'
	...mos/.torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x559ae9bad710
THCudaCheckWarn FAIL file=/home/vimos/.torch/extra/cutorch/lib/THC/THCStream.cpp line=50 error=77 : an illegal memory access was encountered
THCudaCheckWarn FAIL file=/home/vimos/.torch/extra/cutorch/lib/THC/THCStream.cpp line=50 error=77 : an illegal memory access was encountered

But this problem can be fixed by using a larger maxSeqLen

➜  RelationRNN git:(master) ✗ th train_rel_rnn.lua -maxSeqLen 42
[INFO - 2018_05_02_20:11:52] - "--------------------------------------------------"
[INFO - 2018_05_02_20:11:52] - "SeqRankingLoader Configurations:"
[INFO - 2018_05_02_20:11:52] - "    number of batch : 296"
[INFO - 2018_05_02_20:11:52] - "    data batch size : 256"
[INFO - 2018_05_02_20:11:52] - "    neg sample size : 1024"
[INFO - 2018_05_02_20:11:52] - "    neg sample range: 7524"
[INFO - 2018_05_02_20:11:52] - "--------------------------------------------------"
[INFO - 2018_05_02_20:11:52] - "BiGRU Configuration:"
[INFO - 2018_05_02_20:11:52] - "    inputSize   :   300"
[INFO - 2018_05_02_20:11:52] - "    hiddenSize  :   256"
[INFO - 2018_05_02_20:11:52] - "    maxSeqLen   :    42"
[INFO - 2018_05_02_20:11:52] - "    maxBatch    :   256"
[INFO - 2018_05_02_20:11:52] - "--------------------------------------------------"
[INFO - 2018_05_02_20:11:52] - "BiGRU Configuration:"
[INFO - 2018_05_02_20:11:52] - "    inputSize   :   512"
[INFO - 2018_05_02_20:11:52] - "    hiddenSize  :   256"
[INFO - 2018_05_02_20:11:52] - "    maxSeqLen   :    42"
[INFO - 2018_05_02_20:11:52] - "    maxBatch    :   256"
[INFO - 2018_05_02_20:11:56] - "iter  100, loss = 0.00198258"........] ETA: 3h29m | Step: 42ms       
[INFO - 2018_05_02_20:12:00] - "iter  200, loss = 0.00000000"........] ETA: 3h25m | Step: 41ms       
[INFO - 2018_05_02_20:12:04] - "epoch   1, loss 0.00066979"..........] ETA: 3h28m | Step: 42ms       
[INFO - 2018_05_02_20:12:04] - "iter  300, loss = 0.00000000"........] ETA: 3h27m | Step: 42ms       
[INFO - 2018_05_02_20:12:09] - "iter  400, loss = 0.00000000"........] ETA: 3h28m | Step: 42ms       
[INFO - 2018_05_02_20:12:13] - "iter  500, loss = 0.00000000"........] ETA: 3h25m | Step: 41ms       
[INFO - 2018_05_02_20:12:17] - "epoch   2, loss 0.00000000"..........] ETA: 3h26m | Step: 41ms       
[INFO - 2018_05_02_20:12:17] - "iter  600, loss = 0.00000000"........] ETA: 3h26m | Step: 41ms       
[INFO - 2018_05_02_20:12:21] - "iter  700, loss = 0.00000000"........] ETA: 3h28m | Step: 42ms       
[INFO - 2018_05_02_20:12:25] - "iter  800, loss = 0.00000000"........] ETA: 3h27m | Step: 42ms       
[INFO - 2018_05_02_20:12:29] - "epoch   3, loss 0.00000000"..........] ETA: 3h25m | Step: 41ms       
[INFO - 2018_05_02_20:12:30] - "iter  900, loss = 0.00000000"........] ETA: 3h25m | Step: 41ms       
[INFO - 2018_05_02_20:12:34] - "iter 1000, loss = 0.00000000"........] ETA: 3h27m | Step: 42ms 

But the loss will be 0 after the 1 epoch or become infinity

➜  RelationRNN git:(master) ✗ th train_rel_rnn.lua -maxSeqLen 42 -seed 12
[INFO - 2018_05_02_20:26:49] - "--------------------------------------------------"
[INFO - 2018_05_02_20:26:49] - "SeqRankingLoader Configurations:"
[INFO - 2018_05_02_20:26:49] - "    number of batch : 296"
[INFO - 2018_05_02_20:26:49] - "    data batch size : 256"
[INFO - 2018_05_02_20:26:49] - "    neg sample size : 1024"
[INFO - 2018_05_02_20:26:49] - "    neg sample range: 7524"
[INFO - 2018_05_02_20:26:49] - "--------------------------------------------------"
[INFO - 2018_05_02_20:26:49] - "BiGRU Configuration:"
[INFO - 2018_05_02_20:26:49] - "    inputSize   :   300"
[INFO - 2018_05_02_20:26:49] - "    hiddenSize  :   256"
[INFO - 2018_05_02_20:26:49] - "    maxSeqLen   :    42"
[INFO - 2018_05_02_20:26:49] - "    maxBatch    :   256"
[INFO - 2018_05_02_20:26:49] - "--------------------------------------------------"
[INFO - 2018_05_02_20:26:49] - "BiGRU Configuration:"
[INFO - 2018_05_02_20:26:49] - "    inputSize   :   512"
[INFO - 2018_05_02_20:26:49] - "    hiddenSize  :   256"
[INFO - 2018_05_02_20:26:49] - "    maxSeqLen   :    42"
[INFO - 2018_05_02_20:26:49] - "    maxBatch    :   256"
[INFO - 2018_05_02_20:26:53] - "iter  100, loss = 81231552070126006809284050944.00000000" 41ms       
[INFO - 2018_05_02_20:26:57] - "iter  200, loss = 0.00000000"........] ETA: 3h15m | Step: 39ms       
[INFO - 2018_05_02_20:27:01] - "epoch   1, loss 27443091915583111597203128320.00000000"p: 40ms       
[INFO - 2018_05_02_20:27:01] - "iter  300, loss = 0.00000000"........] ETA: 3h17m | Step: 40ms  
@ThisIsSoMe
Copy link

I have not ran the codes,but I wanna know how many available data(subject mention could be found in the question) in the train(75910)&test(21678) can I get after preprocessing.Would you mind solving my problem?And I will be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants