-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model] RWKV #1902
base: master
Are you sure you want to change the base?
[New Model] RWKV #1902
Conversation
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## master #1902 +/- ##
==========================================
- Coverage 93.95% 92.97% -0.98%
==========================================
Files 125 126 +1
Lines 11773 11923 +150
==========================================
+ Hits 11061 11086 +25
- Misses 712 837 +125
☔ View full report in Codecov by Sentry. |
Hi @JanFidor, and thanks for this PR. Just to let you know that we're wrapping up the last few things for the release in 1-2 weeks. Once that's done we'll come back to this and review 🚀 |
@JanFidor Were you able to benchmark this model? |
@gdevos010 just some basic ones, I still have to play around with parameter initializations. On SunspotsDataset I noticed that NLinear and Transformer were having noticeable MAPE changes depending on output_chunk_length (changes around 60 <-> 200 ) while RWKV was consistently performing around 100. I also threw in ETTh1 dataset, with 720 input_chunk _length 336 output_chunk_length. The RWKV had terrible MAPE. Not sure it the architecture was at fault or if it was caused by under fitting. I'll try to make a more comprehensive benchmark next week |
Fixes #1817 .
Quick summary
For now the implementation follows pretty closely what was described in the paper. The implementation from the official RWKV repo has quite a few improvements which weren't discussed in the paper, but for now I wanted to get at least a workable model.
Roadmap
There's still a lot of things to be done, but I wanted to put up a PR as a quick update on how everything's going and a simple roadmap for the future