-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
更新后的rwkv6,loss会nan #19
Comments
Oh, looks that you may need to switch back to logsigmoid, -exp is not stable yet |
This update fixes potential nans during inference, I think it's not the issue. |
RWKV-PEFT 添加fla,目前是可用的。但是一旦更换新fla loss就会nan,如果后续fla有更新可以告诉我 ,我可以进行测试 |
Have you compared the kernel speed |
You can enable gradient for h0 mannually |
Taking h0 as learnable params would be ok? like |
还有一点是,我这里冻结了其他所有权重只保留state的梯度 |
ic, currently there is no access to grad of states. |
thank you |
rwkv-peft上测试非常完美,已经不需要clip了。不过之前infctx训练6000ctx len时偶尔会nan(我会重新测试) 非常感谢您 |
FYI we've recently fixed a bug that causes NaN when log decay is very small. #77 (comment) |
我现在用的是前几天的版本loss正常
The text was updated successfully, but these errors were encountered: