Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Autotune should consider batch size and number of heads #117

Open
sustcsonglin opened this issue Jan 11, 2025 · 1 comment
Open

[RFC] Autotune should consider batch size and number of heads #117

sustcsonglin opened this issue Jan 11, 2025 · 1 comment
Labels
enhancement New feature or request urgent

Comments

@sustcsonglin
Copy link
Collaborator

Proposal

The optimal kernel configuration should adjust based on changes in (batch size × number of heads).

Rationale

The performance of the autotuned kernel can vary significantly when the product of (batch size × number of heads) changes, especially with different levels of parallelism determined by the batch and head dimensions.

@sustcsonglin sustcsonglin added the enhancement New feature or request label Jan 11, 2025
@sustcsonglin sustcsonglin added this to the FLA v1.0.0 release milestone Jan 11, 2025
@sustcsonglin
Copy link
Collaborator Author

Autotuning should also take the total sequence length into account, as the sequence length dimension provides parallelism in addition to the number of heads and batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request urgent
Projects
None yet
Development

No branches or pull requests

1 participant