New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

The error in the implementiation of the MsPoELlamaRotaryEmbedding #5

Open

HaozheZhao opened this issue Jul 31, 2024 · 1 comment

HaozheZhao commented Jul 31, 2024

The following is your implementation of the MsPoELlamaRotaryEmbedding:

    def forward(self, x, seq_len=None):
        # x: [bs, num_attention_heads, seq_len, head_size]
        if seq_len > self.max_seq_len_cached:
            self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)

        return (
            self.cos_cached[:,:seq_len].to(dtype=x.dtype),
            self.sin_cached[:,:seq_len].to(dtype=x.dtype),
        )

However due to the x`s shape of the [bs, num_attention_heads, seq_len, head_size], does the right implementation is:

    def forward(self, x, seq_len=None):
        # x: [bs, num_attention_heads, seq_len, head_size]
        if seq_len > self.max_seq_len_cached:
            self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)

        return (
            self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
            self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
        )

,
which is also align with the original implementation of the Rope Embedding of LLama.

The text was updated successfully, but these errors were encountered:

zhenghuawang6 commented Dec 10, 2024

if the batch_size=1,it is ok !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment