Enhance positional encoding adjustment in SparseCtrl loading with exp… #83
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enhanced the model loading process for SparseCtrl models by introducing the
expected_seq_len
parameter to dynamically adjust positional encoding (PE) dimensions. This improvement ensures the compatibility of positional encodings with models expecting different sequence lengths, enhancing the flexibility and usability of model loading, especially when dealing with models trained with a variety of configurations.Changes include:
expected_seq_len
to SparseSettings, allowing for customizable sequence length settings.pos_encoder.pe
parameters within theadjust_positional_encoding_parameters
function, ensuring that the PE tensors match the expected sequence length.SparseCtrlLoaderAdvanced
andSparseCtrlMergedLoaderAdvanced
classes to utilize theexpected_seq_len
setting, providing a seamless integration into the model loading workflow.This upgrade addresses potential type mismatches and enhances the model's adaptability to different sequence lengths, streamlining the process for users and maintaining robustness across diverse model configurations.