Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: KV Cache exploded #91

Open
rakkit opened this issue Dec 14, 2024 · 6 comments
Open

[Bug]: KV Cache exploded #91

rakkit opened this issue Dec 14, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@rakkit
Copy link

rakkit commented Dec 14, 2024

Describe the bug

In the case of using softmax attention or any other attention with window_size=None, the KV cache update falls into this branch. This logic concatenates all historical sequence states with the new states (attn_state[0] and attn_state[1]), causing exponential growth in the KV cache.

Steps to reproduce the bug

Inference with attention with window_size=None

Expected behavior

KV-cache exploded

Environment info

@rakkit rakkit added the bug Something isn't working label Dec 14, 2024
@yzhangcs
Copy link
Member

@rakkit Hello, could you provide a minimal script for reproduction. I didn't meet the errors you reported by setting window_size=None

@rakkit
Copy link
Author

rakkit commented Jan 6, 2025

Hi, I tried to reproduce this for a while, but I cannot see this problem anymore. I am not sure if i messed up something there. sorry about that.

@rakkit rakkit closed this as completed Jan 6, 2025
@sustcsonglin
Copy link
Collaborator

@rakkit Great to know!

@rakkit
Copy link
Author

rakkit commented Jan 7, 2025

Actually, the Cache explicitly requires cache_kwargs to include the key window_size, here:

window_size = cache_kwargs.get('window_size', None)

However, cache_kwargs is set to None by default, as defined here:

cache_kwargs: Optional[Dict[str, Any]] = None

In the FLA attention implementation, cache_kwargs is passed as a dictionary for cache updates to avoid,:

cache_kwargs = dict(window_size=self.window_size)

though, in some cases where users are unaware of this requirement in cache_kwargs will get an error

@rakkit rakkit reopened this Jan 7, 2025
@yzhangcs
Copy link
Member

Thank you I will check it later. Are you willing to make some PRs?

@rakkit
Copy link
Author

rakkit commented Jan 14, 2025

The easiest solution here

if cache_kwargs is not Nont:
    window_size = cache_kwargs.get('window_size', None)
else:
    window_size = None

or

cache_kwargs = {'window_size':None} if cache_kwargs is None else cache_kwargs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants