Introduce ShawRelativePositionSDPA. #90

kauterry · 2023-10-06T02:29:31Z

What does this PR do? Please describe:

Implement the relative position SDPA as described in https://doi.org/10.48550/arxiv.1803.02155

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

src/fairseq2/nn/transformer/relative_position_attention.py

cbalioglu

Like other PRs, in pretty good shape. Most of my comments are nit picks.

src/fairseq2/nn/transformer/relative_position_attention.py

cbalioglu · 2023-10-09T14:31:29Z

src/fairseq2/nn/transformer/relative_position_attention.py

+        if self.rel_v_embedding is not None:
+            nn.init.xavier_uniform_(self.rel_v_embedding.weight)
+
+    def rel_position_indices(self, seq_len: int) -> Tensor:


I suggest passing a device argument here and constructing the tensor on that device instead of moving it later (i.e. line 128).

Note necessarily required for this PR, but I wonder whether we should cache the generated indices (see relative_position.py as an example) in the future to avoid repeatedly initializing a mostly static buffer.

ah yeah I was wondering how to do this as well!

You mean relative_attention.py?

src/fairseq2/nn/transformer/relative_position_attention.py

src/fairseq2/models/wav2vec2/builder.py

cbalioglu · 2023-10-09T15:07:22Z

src/fairseq2/models/wav2vec2/builder.py

@@ -145,6 +158,9 @@ class Wav2Vec2EncoderConfig:
    conv_norm_type: Literal["batch_norm", "layer_norm"]
    """The type of norm layer in the Conformer convolution module."""

+    shaw_rel_position_sdpa_config: Optional[ShawRelativePositionSDPAConfig]


Nice, it would be a BC breaking change, but I think it would be nicer if we could wrap parameters for "conv" positional encoder in the same way in a dataclass. Not relevant for this PR though. Just food for thought :)

Could you elaborate wrap parameters for "conv" positional encoder in the same way in a dataclass? I don't understand at all.

It is not related to this PR at all. In Wav2Vec2EncoderConfig we have several attributes related to the default convolutional position encoder. We can consolidate them under a single configuration dataclass like you did for Shaw encoder.

Yes we should do that, even I was thinking of that.

src/fairseq2/nn/transformer/shaw_attention.py

kauterry requested a review from cbalioglu as a code owner October 6, 2023 02:29

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 6, 2023

kauterry commented Oct 6, 2023

View reviewed changes

src/fairseq2/nn/transformer/relative_position_attention.py Outdated Show resolved Hide resolved

kauterry force-pushed the relpos_attn branch 2 times, most recently from 64868d7 to f64d89d Compare October 7, 2023 22:02

kauterry changed the base branch from main to causal_conv1d October 7, 2023 22:03

cbalioglu reviewed Oct 9, 2023

View reviewed changes

kauterry force-pushed the causal_conv1d branch from c4d5e71 to 5241bd1 Compare October 9, 2023 18:44

Base automatically changed from causal_conv1d to main October 9, 2023 19:26

kauterry force-pushed the relpos_attn branch from 10e2df9 to 31b7ac0 Compare October 9, 2023 20:41

Introduce ShawRelativePositionSDPA.

bc07def

kauterry force-pushed the relpos_attn branch from 31b7ac0 to bc07def Compare October 9, 2023 20:47

Addressing comments, renaming variables.

57807e3

cbalioglu reviewed Oct 9, 2023

View reviewed changes

src/fairseq2/nn/transformer/shaw_attention.py Outdated Show resolved Hide resolved

src/fairseq2/nn/transformer/shaw_attention.py Outdated Show resolved Hide resolved

src/fairseq2/nn/transformer/shaw_attention.py Outdated Show resolved Hide resolved

Declare rel_*_embed as StandardEmbedding.

4b7a47f

cbalioglu approved these changes Oct 9, 2023

View reviewed changes

cbalioglu merged commit 199cf93 into main Oct 9, 2023

cbalioglu deleted the relpos_attn branch October 9, 2023 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce ShawRelativePositionSDPA. #90

Introduce ShawRelativePositionSDPA. #90

kauterry commented Oct 6, 2023 •

edited

Loading

cbalioglu left a comment

cbalioglu Oct 9, 2023

cbalioglu Oct 9, 2023

kauterry Oct 9, 2023

kauterry Oct 9, 2023

cbalioglu Oct 9, 2023

kauterry Oct 9, 2023

cbalioglu Oct 9, 2023

kauterry Oct 9, 2023

Introduce ShawRelativePositionSDPA. #90

Introduce ShawRelativePositionSDPA. #90

Conversation

kauterry commented Oct 6, 2023 • edited Loading

cbalioglu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kauterry commented Oct 6, 2023 •

edited

Loading