-
mma and wmma instructions are both used to do mma computation, and there are several files about these two instructions under cutlass/arch, just eager to know which circumstances each of them is used. |
Beta Was this translation helpful? Give feedback.
Answered by
hwu36
Jul 10, 2024
Replies: 1 comment
-
wmma is easier to use, but slower. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
wzhcz8902
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
wmma is easier to use, but slower.
mma.sync
ptx is hard to use, but faster. it is hard to use because it takes some effort to prevent shared memory bank conflict.