-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove KernelBundle
, change signature of [get|is]ValidWorkDiv*
#2349
remove KernelBundle
, change signature of [get|is]ValidWorkDiv*
#2349
Conversation
convolutionKernel2D, | ||
alpaka::getPtrNative(bufInputAcc), | ||
alpaka::getPtrNative(outputDeviceMemory), | ||
std::data(bufInputAcc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::data
is used for the kernel call too.
246f4a8
to
527b4bc
Compare
KernelBundle
, change signature of getValidWorkDiv*
a3b09c9
to
2ad95ea
Compare
KernelBundle
, change signature of getValidWorkDiv*
KernelBundle
, change signature of [get|is]ValidWorkDiv*
2ad95ea
to
0ebaa66
Compare
-> alpaka::KernelFunctionAttributes | ||
[[maybe_unused]] TDev const& dev, | ||
[[maybe_unused]] TKernelFn const& kernelFn, | ||
[[maybe_unused]] TArgs&&... args) -> alpaka::KernelFunctionAttributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@psychocoderHPC , irrespective if we are using the kernel bundle, what happens if the arguments passed to getFunctionAttributes
are convertible to the ones expected by the alpaka kernel, but do not match exactly ?
For example, if the alpaka kernel takes (TAcc const& acc, float x, float y, float z)
and we call getFunctionAttributes
with attributes 1.0, 2.0, 3.0
, i.e. double
, not float
.
Does it still work correctly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have the kernel bundle anymore in this PR.
But the behavior is equal to the behavior before, I would say it is implicit casted because it follows the C*+ rules, but I am not 100% sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evaluated function attributes should be correct because we use the same function signature we use for the kernel call. I do not see that this can be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see... the actual kernel is gpuKernel
; the arguments are passed as whatever they are to gpuKernel
, and it's only inside gpuKernel
that we call kernelFn
and potentially convert the arguments.
0ebaa66
to
88e40a6
Compare
//! \return If the work division is valid for the given accelerator device properties. | ||
template<typename TDim, typename TIdx, typename TWorkDiv> | ||
ALPAKA_FN_HOST auto isValidWorkDiv(AccDevProps<TDim, TIdx> const& accDevProps, TWorkDiv const& workDiv) -> bool | ||
template<, typename TWorkDiv, typename TDim, typename TIdx> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template<, typename TWorkDiv, typename TDim, typename TIdx> | |
template<typename TWorkDiv, typename TDim, typename TIdx> |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type during the last refactroing of the template argument order :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
- revert introduced `KernelBundle` in alpaka-group#2251 - change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel` - reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
88e40a6
to
0ec7f32
Compare
@fwyzard ready to be merged |
@@ -118,7 +118,7 @@ auto example(TAccTag const&) -> int | |||
using Data = std::uint32_t; | |||
constexpr Idx nElementsPerDim = 2; | |||
|
|||
const Vec extents(Vec::all(static_cast<Idx>(nElementsPerDim))); | |||
Vec const extents(Vec::all(static_cast<Idx>(nElementsPerDim))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vec const extents(Vec::all(static_cast<Idx>(nElementsPerDim))); | |
Vec const extents(Vec::all(nElementsPerDim)); |
since nElementsPerDim
is already of type Idx
.
typename... TArgs> | ||
ALPAKA_FN_HOST auto getValidWorkDiv( | ||
KernelCfg<TAcc, TGridElemExtent, TThreadElemExtent> const& kernelCfg, | ||
[[maybe_unused]] TDev const& dev, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why [[maybe_unused]]
?
bundeledKernel, | ||
globalThreadExtent, elementsPerThread, | ||
KernelCfg<Acc> const kernelCfg = { | ||
globalThreadExtent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be something like globalElementsExtent
rather than globalThreadExtent
, because the number of elements per thread are also taken into account.
auto const& bundeledFillBufferKernel = alpaka::KernelBundle(fillBufferKernel, hostViewPlainPtrMdSpan); | ||
auto const hostWorkDiv | ||
= alpaka::getValidWorkDivForKernel<Host>(devHost, bundeledFillBufferKernel, threadsPerGrid, elementsPerThread); | ||
alpaka::KernelCfg<Host> const hostKernelCfg = {threadsPerGrid, elementsPerThread}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alpaka::KernelCfg<Host> const hostKernelCfg = {threadsPerGrid, elementsPerThread}; | |
alpaka::KernelCfg<Host> const hostKernelCfg = {elementsPerGrid, elementsPerThread}; |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment applies to the other examples.
@psychocoderHPC I've left some comments about possible improvements and suggestions. We can also merge the PR as-is, and address them later, if you prefer. |
I will fix only #2349 (comment) because it is a bug. All other are copy past from the existing code. I can address this in an seperate PR. |
@fwyzard I pushed the cheat sheet fix |
KernelBundle
from PR Fix getValidWorkDiv for CUDA and ROCm [I] #2222 #2251getValidWorkDivForKernel
,isValidWorkDivKernel
andisValidWorkDivKernel
getValidWorkDiv
andisValidWorkDiv
getValidWorkDiv
in examples and testsThis PR replaces #2340 where @fwyzard observed performance reduction #2340 (comment) and #2339 (comment)
Review