Fused moe gemm + silu activation kernel #710
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
!!Merge only after the tianxing/moe_quantization branch is merged!!
Implemented the moe gemm + silu_and_mul fusion. The logic of the fusion is:
[0, 0 + BLOCK_N // 2, 1, 1 + BLOCK_N // 2, 3, 3 + BLOCK_N // 2, ..., BLOCK_N // 2 - 1, BLOCK_N]
You can benchmark the fused kernel (moe gemm + silu activation) with the
-use_silu_activation
flag.Implemented a separate
silu_and_mul
kernel. You can benchmark the moe gemm + silu activation combo as separate kernels on triton with-use_silu_activation_non_fused
flag.The fused kernel:
moe gemm + silu activation as separate kernels:
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.
Complete the following tasks before sending your PR, and replace
[ ]
with[x]
to indicate you have done them.I am not making a trivial change, such as fixing a typo in a comment.
[ x I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD
.Select one of the following.
/test
forlit
tests/unittest
for C++ tests/python/test
for end-to-end testsFILL THIS IN
.Select one of the following.
lit
tests.lit
tests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)