You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The optimal kernel configuration should adjust based on changes in (batch size × number of heads).
Rationale
The performance of the autotuned kernel can vary significantly when the product of (batch size × number of heads) changes, especially with different levels of parallelism determined by the batch and head dimensions.
The text was updated successfully, but these errors were encountered:
Autotuning should also take the total sequence length into account, as the sequence length dimension provides parallelism in addition to the number of heads and batch size.
Proposal
The optimal kernel configuration should adjust based on changes in (batch size × number of heads).
Rationale
The performance of the autotuned kernel can vary significantly when the product of (batch size × number of heads) changes, especially with different levels of parallelism determined by the batch and head dimensions.
The text was updated successfully, but these errors were encountered: