-
Hi everyone, I noticed that in CUTLASS 3.4 (#1286), there was a change in cute/atom/mma_atom.hpp that altered the signature of Specifically: CUTLASS 3.4: // @tparam MMA_Atom The MMA_Atom to use in the TiledMMA
// @tparam AtomLayoutMNK The MNK-tiling of the Atom to be performed.
// @tparam PermuationsMNK Permutations to apply to each MNK-mode before tiling for the Atom.
template <class MMA_Atom,
class AtomLayoutMNK,
class PermutationMNK = Tile<Underscore,Underscore,Underscore>>
struct TiledMMA : MMA_Atom { ... } Before: template <class MMA_Atom,
class AtomLayoutMNK = Layout<Shape<_1,_1,_1>>,
class ValLayoutMNK = Layout<Shape<_1,_1,_1>>,
class PermutationsMNK = Tile<Underscore,Underscore,Underscore>>
struct TiledMMA : MMA_Atom { ... } So, what is the To be clear, thanks to this Zhihu post, I kind of understood that in the old version:
Now that using TiledMma = decltype(
make_tiled_mma(
MmaAtom{},
Layout<Shape<_1, _1, _1>>{}, // this should be AtomLayoutMNK
Layout<Shape<_2, _2, _2>>{} // this should be PermutationMNK, but what does it do now?
)
); but I don't know what is going on with what I passed to Hoping for an explanation. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 9 replies
-
Thanks for the question, my update to the CuTe documentation and examples is still pending but should be available soon. The easiest way to think about it is that the I'll start with some examples and then get to your case. Let's start with TiledMMA tiled_mma = make_tiled_mma(SM80_8x8x4_F64F64F64F64_TN{});
print_latex(tiled_mma); The above is equivalent to TiledMMA tiled_mma = make_tiled_mma(SM80_8x8x4_F64F64F64F64_TN{},
Layout<Shape<_1,_1,_1>>{},
Tile<_8,_8,_4>{});
print_latex(tiled_mma); as the atom already has a natural tile size of 8x8x4. But we can immediately expand this "tile size" up to 8x16x8 instead: TiledMMA tiled_mma = make_tiled_mma(SM80_8x8x4_F64F64F64F64_TN{},
Layout<Shape<_1,_1,_1>>{}, // AtomLayout
Tile<_8,_16,_8>{}); // Tiler
print_latex(tiled_mma); This doesn't actually affect the partitioning of input/output tensors because, by convention, only a single atom is ever partitioned out. It will affect the output of Continuing, I see those four values that TiledMMA tiled_mma = make_tiled_mma(SM80_8x8x4_F64F64F64F64_TN{},
Layout<Shape<_1,_1,_1>>{}, // AtomLayout
Tile<_8, // Permutation on M, equivalent to 8:1, identity
Layout<Shape <_2,_4,_2>,
Stride<_1,_4,_2>>, // Permutation on N, size 16
_8>{}); // Permutation on K, equivalent to 8:1, identity
print_latex(tiled_mma); That layout Your example is doing something a bit silly, but perfectly functional. using TiledMma = decltype(
make_tiled_mma(
MmaAtom{},
Layout<Shape<_1, _1, _1>>{}, // this should be AtomLayoutMNK
Layout<Shape<_2, _2, _2>>{} // this should be PermutationMNK, but what does it do now?
)
); Because make_tiled_mma(MMA_Atom<MMA_Op> const& mma_atom,
MMAThrLayout const& thr_layout = {},
Permutations const& permutations = {})
{
auto thr_layout_mnk = append<3>(thr_layout, Layout<_1,_0>{});
auto permutation_mnk = append<3>(permutations, _);
return TiledMMA<MMA_Atom<MMA_Op>,
decltype(thr_layout_mnk),
decltype(permutation_mnk)>{mma_atom, thr_layout_mnk};
} You're getting the following as a result TiledMMA<AtomType,
Layout<Shape<_1,_1,_1>>, // AtomLayoutMNK
Tile<Layout<Shape<_2,_2,_2>>, // Permutation on M, (2,2,2):(1,2,4), identity of size 8
Underscore, // Permutation on N, noop identity
Underscore>> // Permutation on K, noop identity So you can see that your Layout is becoming a rank-3 Tiler, but the permutations on each mode are all identities anyway and have little to no effect. Hope that helps! I'll try to get explanations and examples like these into the documentation as soon as possible. |
Beta Was this translation helpful? Give feedback.
-
Nice explanation! But I still have problem about this: Layout<Shape <_2,_4,_2>,
Stride<_1,_4,_2>> it seems it takes element idx is: 0 1 4 5 8 9 12 13 2 3 6 7 10 11 14 15 But in your case it need to take in n axis like: 0 1 8 9 2 3 10 11 4 5 12 13 6 7 14 15?。。。 |
Beta Was this translation helpful? Give feedback.
-
How does this upper-right matrix represent its Layout in N dimension as Layout<Shape <_2,_4,_2>,Stride<_1,_4,_2>> Thank you very much for your reply! |
Beta Was this translation helpful? Give feedback.
-
Is it right to think |
Beta Was this translation helpful? Give feedback.
Thanks for the question, my update to the CuTe documentation and examples is still pending but should be available soon.
The easiest way to think about it is that the
Permutation
parameter is a Tiler for the MNK modes of the MMA. That is, it is a set of three layouts that are applied to the M-mode, N-mode, and K-mode individually before applying the TV-layouts and projective slicing. These layouts can act to permute the M-mode, N-mode, and K-mode individually to make the TV-partitioning patterns more manageable/intuitive and effectively interleave individual MMAs.I'll start with some examples and then get to your case. Let's start with
SM80_8x8x4_F64F64F64F64_TN