-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Weight Conversion #120
Comments
@Triang-jyed-driung Hello, actually I've included some conversion scripts, including Llama and RWKV6 for use The mamba code in fla is adapted from hf so can directly load hf weights in hf hub. Very glad if you could contribute more for conversions of other linear models. |
Checkout https://huggingface.co/collections/fla-hub/rwkv6-665aaa86d4714ed3f8595aec Both are converted from the official ckpts |
The original RWKV-6 weights come from BlinkDL and you got second-hand weights.
Is |
@Triang-jyed-driung We've done some inplace mul, so backward might not work. |
I see but it's ok if var names are identical |
Packing does not work for RLHF, no packing supported for https://huggingface.co/docs/trl/en/ppo_trainer |
@Triang-jyed-driung Thank you, you can try the current layers. |
Feature Request
Convert official weights to Flash-linear-attention's format, including RWKV, Mamba, etc.
Motivation
The official model weights of many linear models (especially RWKV series) cannot be directly accepted by Flash-linear-attention's code. Currently, some SFT and RLHF implementations rely heavily on the correct implementation of
attention_mask
to pad and truncate variable lengths. (RWKV'sattention_mask
is known to fail, Mamba'sattention_mask
works for forward but likely not working with backward). If we can convert the pre-trained weights to this module, making them compatible with HF and supporting variable lengths, we can easily implement SFT and RLHF for linear models.Your Contribution
I'd like to try SFT and RLHF on some linear models.
The text was updated successfully, but these errors were encountered: