更新后的rwkv6，loss会nan #19

JL-er · 2024-05-16T04:44:31Z

我现在用的是前几天的版本loss正常

yzhangcs · 2024-05-16T04:56:12Z

Oh, looks that you may need to switch back to logsigmoid, -exp is not stable yet

JL-er · 2024-05-16T06:07:13Z

这是可行的loss非常稳定，基本没有误差

JL-er · 2024-05-16T06:08:41Z

应该是这次更新的问题

yzhangcs · 2024-05-16T06:13:53Z

This update fixes potential nans during inference, I think it's not the issue.
Possibly cuz of potential inf grad of -exp, would check it, thank you

JL-er · 2024-05-16T06:20:22Z

RWKV-PEFT 添加fla，目前是可用的。但是一旦更换新fla loss就会nan，如果后续fla有更新可以告诉我，我可以进行测试

JL-er · 2024-05-16T06:47:31Z

不知道为什么fla的rwkv6，竟然没有cuda快，我之前测试gla的时候会快很多

yzhangcs · 2024-05-16T06:49:05Z

Have you compared the kernel speed

JL-er · 2024-05-16T07:00:47Z

我找时间测一下，对了还有个问题，我在做state tuning的时候，替换上fla算子会出现报错

应该是state没有保存梯度的原因，所以想问一下怎么解决？

yzhangcs · 2024-05-16T07:06:29Z

You can enable gradient for h0 mannually

yzhangcs · 2024-05-16T07:08:27Z

Taking h0 as learnable params would be ok? like h0 = nn.Parameter(key_dim, head_dim)

JL-er · 2024-05-16T07:10:54Z

我在使用cuda算子时是可以正常运行的，但是fla不行，正常情况state在算子计算的梯度会自动保存

JL-er · 2024-05-16T07:13:56Z

还有一点是，我这里冻结了其他所有权重只保留state的梯度

yzhangcs · 2024-05-16T10:21:18Z

ic, currently there is no access to grad of states.
we will add an option later

JL-er · 2024-05-16T10:22:40Z

thank you

yzhangcs · 2024-05-24T15:18:01Z

@JL-er Hi, check it out 1547448

Now we do not truncate grad of h states for RWKV6 for ease of state tuning
Do contact us if you met any bugs or any numerical stability issues :-D

JL-er · 2024-05-27T03:53:38Z

rwkv-peft上测试非常完美，已经不需要clip了。不过之前infctx训练6000ctx len时偶尔会nan（我会重新测试）非常感谢您

sustcsonglin · 2025-01-12T01:52:18Z

FYI we've recently fixed a bug that causes NaN when log decay is very small. #77 (comment)

yzhangcs pinned this issue May 17, 2024

yzhangcs unpinned this issue May 17, 2024

sustcsonglin added the bug Something isn't working label May 18, 2024

yzhangcs added a commit that referenced this issue May 25, 2024

[RWKV6] Fix NaN grads (#19)

b16c81e

yzhangcs closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

更新后的rwkv6，loss会nan #19

更新后的rwkv6，loss会nan #19

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024 •

edited

Loading

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024

yzhangcs commented May 24, 2024 •

edited

Loading

JL-er commented May 27, 2024 •

edited

Loading

sustcsonglin commented Jan 12, 2025

更新后的rwkv6，loss会nan #19

更新后的rwkv6，loss会nan #19

Comments

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024 • edited Loading

JL-er commented May 16, 2024

yzhangcs commented May 16, 2024

JL-er commented May 16, 2024

yzhangcs commented May 24, 2024 • edited Loading

JL-er commented May 27, 2024 • edited Loading

sustcsonglin commented Jan 12, 2025

JL-er commented May 16, 2024 •

edited

Loading

yzhangcs commented May 24, 2024 •

edited

Loading

JL-er commented May 27, 2024 •

edited

Loading