Apply universal gemm to bwd_weight_cshuffle operator #1658

mozga-amd · 2024-11-12T15:29:26Z

No description provided.

bartekxk

I will continue review later

example/20_grouped_conv_bwd_weight/grouped_conv_bwd_weight_xdl_bf16.cpp

include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_bwd_weight_xdl_cshuffle.hpp

bartekxk

Hi @mozga-amd please remove merged groups support for this kernel. Then please update instances. After that we can review it again

spolifroni-amd

Nothing to review for docs. If anything needs to be documented, let me know.

…l_gemm_weight

bartekxk

I added a few comment. Please focus firstly on:
Regarding API changes, we have two solutions but please make some local measurements:

If gridwise gemm v3 is better for each case can we move BlkGemmPipeSched and BlgGemmPipVer at the end and keep api consistent?
If gridwise gemm v3 is better but not for each, then can we restore previous implementation and copy your new DeviceGroupedConvBwdWeight_Xdl_CShuffle as DeviceGroupedConvBwdWeight_Xdl_CShuffleV3?

example/20_grouped_conv_bwd_weight/grouped_conv_bwd_weight_xdl_bf16.cpp

bartekxk · 2024-12-30T21:52:10Z

example/20_grouped_conv_bwd_weight/grouped_conv_bwd_weight_xdl_fp16_comp_bf8_fp8.cpp

+        ck::BlockGemmPipelineVersion::v1,          // BlkGemmPipelineVer
+        ComputeTypeA,                              // ComputeTypeA
+        ComputeTypeB>;                             // ComputeTypeB
+// clang-format on


Regarding API changes, we have two solutions but please make some local measurements:

If gridwise gemm v3 is better for each case can we move BlkGemmPipeSched and BlgGemmPipVer at the end and keep api consistent?

If gridwise gemm v3 is better but not for each, then can we restore previous implementation and copy your new DeviceGroupedConvBwdWeight_Xdl_CShuffle as DeviceGroupedConvBwdWeight_Xdl_CShuffleV3?

include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_bwd_weight_xdl_cshuffle.hpp

bartekxk · 2024-12-30T22:06:51Z

include/ck/tensor_operation/operator_transform/transform_conv_bwd_weight_to_gemm_v2.hpp

+                       const index_t K,
+                       const std::array<index_t, NDimSpatial + 3>& output_strides)
+    {
+        const index_t BatchStride = output_strides[0];


For, out, in and wei. Can we add some condintion if (NumGroupsToMerge == 1) {/* Create descriptor in typical way (like in v1) ?} Or if possible use transform_v1 in device_op? I think it could inpact on performance

...ary/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_backward_weight.hpp

test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight_interface_xdl.cpp

include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_bwd_weight_xdl_cshuffle.hpp

…l_gemm_weight

mozga-amd force-pushed the mozga-amd/universal_gemm_weight branch from 118fd3f to 932e6a0 Compare November 12, 2024 15:39

bartekxk reviewed Nov 15, 2024

View reviewed changes

bartekxk requested changes Dec 3, 2024

View reviewed changes

mozga-amd added 2 commits December 16, 2024 12:02

Apply cshuffle to bwd_weight_cshuffle operator

ebb5522

Remove unused mgroup

860433e

mozga-amd force-pushed the mozga-amd/universal_gemm_weight branch from 932e6a0 to 860433e Compare December 16, 2024 12:03

mozga-amd marked this pull request as ready for review December 16, 2024 12:05

mozga-amd requested review from junliume, illsilin, carlushuang, qianfengz, aosewski, poyenc, geyyer, andriy-ca and a team as code owners December 16, 2024 12:05

spolifroni-amd previously approved these changes Dec 16, 2024

View reviewed changes

aosewski requested a review from bartekxk December 17, 2024 09:50

Pass 4d sequence and convert to 3d

ccf9463

mozga-amd dismissed spolifroni-amd’s stale review via ccf9463 December 19, 2024 18:46

mozga-amd added 7 commits December 20, 2024 14:37

Rewrite Sequence to old style

f358055

Merge remote-tracking branch 'origin/develop' into mozga-amd/universa…

2a76842

…l_gemm_weight

Comments, adjustment

5e95b63

Typo

03eb017

Removed declytype

fb54b55

Typo comments

2c546b0

Merge remote-tracking branch 'origin/develop' into mozga-amd/universa…

5002a39

…l_gemm_weight

bartekxk requested changes Dec 30, 2024

View reviewed changes

bartekxk reviewed Dec 30, 2024

View reviewed changes

include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_bwd_weight_xdl_cshuffle.hpp Show resolved Hide resolved

bartekxk reviewed Dec 30, 2024

View reviewed changes

include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_bwd_weight_xdl_cshuffle.hpp Show resolved Hide resolved

mozga-amd added 2 commits January 6, 2025 21:47

Rollback tests, removed gnhwc instances

7ff4d61

Merge remote-tracking branch 'origin/develop' into mozga-amd/universa…

dd0188b

…l_gemm_weight

mozga-amd requested a review from afagaj as a code owner January 6, 2025 21:48

Fix function calls

1a0af70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply universal gemm to bwd_weight_cshuffle operator #1658

Apply universal gemm to bwd_weight_cshuffle operator #1658

mozga-amd commented Nov 12, 2024

bartekxk left a comment

bartekxk left a comment

spolifroni-amd left a comment

bartekxk left a comment

bartekxk Dec 30, 2024

bartekxk Dec 30, 2024

Apply universal gemm to bwd_weight_cshuffle operator #1658

Are you sure you want to change the base?

Apply universal gemm to bwd_weight_cshuffle operator #1658

Conversation

mozga-amd commented Nov 12, 2024

bartekxk left a comment

Choose a reason for hiding this comment

bartekxk left a comment

Choose a reason for hiding this comment

spolifroni-amd left a comment

Choose a reason for hiding this comment

bartekxk left a comment

Choose a reason for hiding this comment

bartekxk Dec 30, 2024

Choose a reason for hiding this comment

bartekxk Dec 30, 2024

Choose a reason for hiding this comment