-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix getValidWorkDiv for CUDA and ROCm [I] #2222 #2251
Fix getValidWorkDiv for CUDA and ROCm [I] #2222 #2251
Conversation
e8dca57
to
6400b7f
Compare
503e26e
to
f0dd41e
Compare
30694dc
to
de0aed5
Compare
589c616
to
3d3322e
Compare
offline discussed in the developer meeting:
|
0a04381
to
5a21705
Compare
please rebase against dev to get the fix from #2294 in and remove draft from the PR if it is ready |
Co-authored-by: Andrea Bocci <[email protected]>
5a21705
to
b7f84a5
Compare
Done, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two tiny things, than it is IMO mergable
{ | ||
// Expected valid workdiv for this kernel. These values might change depending on the GPU type and compiler | ||
// therefore commented out. Number of registers used by kernel might limit threads per block value differently | ||
// for different GPUs. CHECK(workDiv == WorkDiv{Vec{8, 729444}, Vec{1, 736}, Vec{1, 1}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment looks outdated with this hard coded values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not outdated, but i changed the comment since could be confusing and values would change for different GPU versions.
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
…2349) * remove `KernelBundle`, change signature of `getValidWorkDiv*` - revert introduced `KernelBundle` in #2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv` * use new interface for `getValidWorkDiv` * fix cheat sheet
getValidWorkDiv
has a bug. It only considers device hard properties (TApi::getDeviceProperties()
); does not consider the kernel function. Actually kernel function properties can limit number of threads per block.In the case of CUDA and ROCm, the function
getValidWorkDiv
should useTApi::funcGetAttributes(&funcAttrs, kernelName)
together withTApi::getDeviceProperties()
.I have added 3 functions.
getFunctionAttributes
,getValidWorkDivKernel
andisValidWorkDivKernel
. One function intrinsic to Alpaka changed:subDivideGridElems
. Now has an additional argument; "threads per block", if the value is zero the function uses device hard properties as before.Next PR: The 2 functions
getValidWorkDivKernel
andisValidWorkDivKernel
will be used in Examples and Tests instead of getValidWorkDiv and isValidWorkDiv at the next PR. The names can be determined.Credits to @psychocoderHPC for the PR.
Tests: The tests for 1D and 2D workdivs were added.