-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add attentive layer to Jepa #927
Conversation
I am outside for a couple errands, will review before EOD. |
src/fairseq2/recipes/jepa/models.py
Outdated
def forward( | ||
self, seqs: Tensor, padding_mask: PaddingMask | None | ||
) -> tuple[Tensor, PaddingMask | None]: | ||
if self.encoder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that this forward implementation is accurate? In the reference implementation here, I see that encoder stack is applied after cross attention which is the reverse of what we have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the internal code seems to deviate from this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What does this PR do? Please describe:
This PR follows up #889 to add the building blocks (models, builders, loader) for the finetuned JEPA encoder, plus testing scripts of the models in different downstream tasks.
Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.
Check list: