[Bug] The offset when updating KV cache seems incorrect. #126

t1101675 · 2025-01-19T01:50:03Z

Describe the Bug

The offset argument in the following code determines how many new tokens are generated in the current step:

          past_key_value.update(
                recurrent_state=recurrent_state,
                conv_state=(conv_state_q, conv_state_k, conv_state_v) if self.use_short_conv else None,
                layer_idx=self.layer_idx,
                offset=q.shape[2]
            )

which corresponds to this line. However, q has size b*t*h*d after the rearange operation and thus q.shape[2] is the number of heads rather than the number of tokens. offset=q.shape[1] seems reasonable.

Steps to Reproduce the Bug

import fla
from transformers import AutoModelForCausalLM, AutoTokenizer
name = 'fla-hub/gla-1.3B-100B'
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name).cuda()
input_prompt = "Power goes with permanence. Impermanence is impotence. And rotation is castration."
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=64)

Just print q.shape[2].

Expected Behavior

q.shape[2] = 4

Environment Information

Same as that in README.

The text was updated successfully, but these errors were encountered:

t1101675 added the bug Something isn't working label Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The offset when updating KV cache seems incorrect. #126

[Bug] The offset when updating KV cache seems incorrect. #126

t1101675 commented Jan 19, 2025

[Bug] The offset when updating KV cache seems incorrect. #126

[Bug] The offset when updating KV cache seems incorrect. #126

Comments

t1101675 commented Jan 19, 2025

Describe the Bug

Steps to Reproduce the Bug

Expected Behavior

Environment Information