Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove KernelBundle, change signature of [get|is]ValidWorkDiv* #2349

Conversation

psychocoderHPC
Copy link
Member

@psychocoderHPC psychocoderHPC commented Aug 9, 2024

  • revert introduced KernelBundle from PR Fix getValidWorkDiv for CUDA and ROCm [I] #2222 #2251
  • change signature of getValidWorkDivForKernel,isValidWorkDivKernel and isValidWorkDivKernel
    • isValidWorkDiv is using the work division as first parameter and all dependencies e.g. device, kernel, ... followed
  • reuse old naming getValidWorkDiv and isValidWorkDiv
  • use new interface for getValidWorkDiv in examples and tests

This PR replaces #2340 where @fwyzard observed performance reduction #2340 (comment) and #2339 (comment)

Review

  • this PR is split into two commits, first changing the interface and second usage of the interface

@psychocoderHPC psychocoderHPC added this to the 1.2.0 milestone Aug 9, 2024
@psychocoderHPC psychocoderHPC requested a review from fwyzard August 9, 2024 14:02
convolutionKernel2D,
alpaka::getPtrNative(bufInputAcc),
alpaka::getPtrNative(outputDeviceMemory),
std::data(bufInputAcc),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::data is used for the kernel call too.

@psychocoderHPC psychocoderHPC force-pushed the topic-getValidWorkdivRefactoring branch 3 times, most recently from 246f4a8 to 527b4bc Compare August 9, 2024 14:48
@psychocoderHPC psychocoderHPC changed the title Topic get valid workdiv refactoring remove KernelBundle, change signature of getValidWorkDiv* Aug 9, 2024
@psychocoderHPC psychocoderHPC force-pushed the topic-getValidWorkdivRefactoring branch 5 times, most recently from a3b09c9 to 2ad95ea Compare August 9, 2024 16:06
@psychocoderHPC psychocoderHPC changed the title remove KernelBundle, change signature of getValidWorkDiv* remove KernelBundle, change signature of [get|is]ValidWorkDiv* Aug 9, 2024
@psychocoderHPC psychocoderHPC force-pushed the topic-getValidWorkdivRefactoring branch from 2ad95ea to 0ebaa66 Compare August 9, 2024 16:30
-> alpaka::KernelFunctionAttributes
[[maybe_unused]] TDev const& dev,
[[maybe_unused]] TKernelFn const& kernelFn,
[[maybe_unused]] TArgs&&... args) -> alpaka::KernelFunctionAttributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psychocoderHPC , irrespective if we are using the kernel bundle, what happens if the arguments passed to getFunctionAttributes are convertible to the ones expected by the alpaka kernel, but do not match exactly ?

For example, if the alpaka kernel takes (TAcc const& acc, float x, float y, float z) and we call getFunctionAttributes with attributes 1.0, 2.0, 3.0 , i.e. double, not float.

Does it still work correctly ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have the kernel bundle anymore in this PR.
But the behavior is equal to the behavior before, I would say it is implicit casted because it follows the C*+ rules, but I am not 100% sure.

Copy link
Member Author

@psychocoderHPC psychocoderHPC Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluated function attributes should be correct because we use the same function signature we use for the kernel call. I do not see that this can be different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see... the actual kernel is gpuKernel; the arguments are passed as whatever they are to gpuKernel, and it's only inside gpuKernel that we call kernelFn and potentially convert the arguments.

@psychocoderHPC psychocoderHPC force-pushed the topic-getValidWorkdivRefactoring branch from 0ebaa66 to 88e40a6 Compare August 13, 2024 11:45
//! \return If the work division is valid for the given accelerator device properties.
template<typename TDim, typename TIdx, typename TWorkDiv>
ALPAKA_FN_HOST auto isValidWorkDiv(AccDevProps<TDim, TIdx> const& accDevProps, TWorkDiv const& workDiv) -> bool
template<, typename TWorkDiv, typename TDim, typename TIdx>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template<, typename TWorkDiv, typename TDim, typename TIdx>
template<typename TWorkDiv, typename TDim, typename TIdx>

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type during the last refactroing of the template argument order :-(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

- revert introduced `KernelBundle` in alpaka-group#2251
- change signature of `getValidWorkDivForKernel`,`isValidWorkDivKernel` and `isValidWorkDivKernel`
- reuse old naming `getValidWorkDiv` and `isValidWorkDiv`
@psychocoderHPC psychocoderHPC force-pushed the topic-getValidWorkdivRefactoring branch from 88e40a6 to 0ec7f32 Compare August 14, 2024 07:59
@psychocoderHPC
Copy link
Member Author

@fwyzard ready to be merged

@@ -118,7 +118,7 @@ auto example(TAccTag const&) -> int
using Data = std::uint32_t;
constexpr Idx nElementsPerDim = 2;

const Vec extents(Vec::all(static_cast<Idx>(nElementsPerDim)));
Vec const extents(Vec::all(static_cast<Idx>(nElementsPerDim)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Vec const extents(Vec::all(static_cast<Idx>(nElementsPerDim)));
Vec const extents(Vec::all(nElementsPerDim));

since nElementsPerDim is already of type Idx.

typename... TArgs>
ALPAKA_FN_HOST auto getValidWorkDiv(
KernelCfg<TAcc, TGridElemExtent, TThreadElemExtent> const& kernelCfg,
[[maybe_unused]] TDev const& dev,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why [[maybe_unused]] ?

bundeledKernel,
globalThreadExtent, elementsPerThread,
KernelCfg<Acc> const kernelCfg = {
globalThreadExtent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be something like globalElementsExtent rather than globalThreadExtent, because the number of elements per thread are also taken into account.

auto const& bundeledFillBufferKernel = alpaka::KernelBundle(fillBufferKernel, hostViewPlainPtrMdSpan);
auto const hostWorkDiv
= alpaka::getValidWorkDivForKernel<Host>(devHost, bundeledFillBufferKernel, threadsPerGrid, elementsPerThread);
alpaka::KernelCfg<Host> const hostKernelCfg = {threadsPerGrid, elementsPerThread};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alpaka::KernelCfg<Host> const hostKernelCfg = {threadsPerGrid, elementsPerThread};
alpaka::KernelCfg<Host> const hostKernelCfg = {elementsPerGrid, elementsPerThread};

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment applies to the other examples.

@fwyzard
Copy link
Contributor

fwyzard commented Aug 14, 2024

@psychocoderHPC I've left some comments about possible improvements and suggestions.

We can also merge the PR as-is, and address them later, if you prefer.

fwyzard
fwyzard previously approved these changes Aug 14, 2024
@psychocoderHPC
Copy link
Member Author

@psychocoderHPC I've left some comments about possible improvements and suggestions.

We can also merge the PR as-is, and address them later, if you prefer.

I will fix only #2349 (comment) because it is a bug. All other are copy past from the existing code. I can address this in an seperate PR.

@psychocoderHPC
Copy link
Member Author

@fwyzard I pushed the cheat sheet fix

@SimeonEhrig SimeonEhrig merged commit d7c459d into alpaka-group:develop Aug 15, 2024
22 checks passed
@psychocoderHPC psychocoderHPC deleted the topic-getValidWorkdivRefactoring branch August 16, 2024 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants