-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matrix Multiplication example using MdSpan #2317
Matrix Multiplication example using MdSpan #2317
Conversation
@@ -0,0 +1,53 @@ | |||
# | |||
# Copyright 2023 Erik Zenker, Benjamin Worpitz, Jan Stephan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be modiefied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok changed to 2024 and SPDX-License-Identifier: MPL-2.0
// Needed for running example for all backends available; one by one | ||
#include <alpaka/example/ExecuteForEachAccTag.hpp> | ||
|
||
#include <experimental/mdspan> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be wrapped in a check on the C++ standard, so if we are using c++23 we take <mdspan>
and otherise if we are using the version that comes with alpaka we take "experimental/mdspan"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree because the version shipped with alpaka is different to the standard implementation in an important point. The alpaka version uses the operator()
[1] to access elements. The standard implementation uses the operator[]
because since C++ 23 more than one argument is allowed. Therefore we would need to add a preporcessor if-else or wrapper function each time the memory is accessed.
[1] The experimental implementation provides also the operator[]
if we use C++ 23. But we also support C++17 and 20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, I see.
On the other hand, alpaka
(the library) does not use the mdspan
objects, it only returns them.
I think that, if somebody uses alpaka
in a c++23 environment, they should get the official mdspan
, not the experimental version (like we do for boost::atomic_ref
vs std::atomic_ref
).
Can we #if __cplusplus
the examples to use ()
for c++17/c++20, and []
for c++23 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I think this can be a little bit more work.
The implementation we use creates a defines MDSPAN_USE_BRACKET_OPERATOR
. Than it looks like this: https://github.com/SimeonEhrig/MyCpp/blob/28d20632657697533184f95c883739718f8397ad/features/23/mdspan/hello_mspan/main.cpp#L44
I expect, that the define will be not created by the standard library and we need to create it by our self.
Another point is, that our implementation annotates the functions with __host__
and __device__
to make it compatible to CUDA and HIP. At the moment, Nvidia does not provide a stl implementation and I'm not sure, if AMD does it: https://en.cppreference.com/w/cpp/compiler_support#cpp23
But Clang provides already an implementation. Therefore, in the worst case wit detect the stl implementation of Clang, which is maybe not annotated and tries to compile it with the CUDA or HIP backend. It this case, we need a workaround selection mechanism, if the mdspan implementation is not annotated.
Long story short, I agree that we should be forward compatible but I think we need to solve this in an extra PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long story short, I agree that we should be forward compatible but I think we need to solve this in an extra PR.
OK for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i will open a related issue not to forget, or @SimeonEhrig could you open since you know the details...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened it already: #2319
//! \param C Output matrix where the result of A * B will be stored | ||
//! \param K The shared dimension between A and B | ||
template<typename TAcc, typename MdSpan> | ||
ALPAKA_FN_ACC void operator()(TAcc const& acc, MdSpan A, MdSpan B, MdSpan C, Idx K) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't K
be derived from A
and B
, with a check that the dimensions match ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks done.
// Wait for the kernel to finish | ||
alpaka::wait(queue); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will keep this and added a comment saying "if the queue is blocking this is not needed."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this wait
is never needed, also is the queue is not blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wait
below (after the memcpy
) is indeed needed, and it should always be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the last wait
(after the memcpy
), if the queue is blocking the host is blocked by memcpy
hence no need to wait
after memcpy
? (and the queue is blocking-queue using Queue = alpaka::Queue<Acc, alpaka::Blocking>;
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
If a queue is non-blocking, it is necessary to do alpaka::wait(queue)
after the memcpy
and before accessing the data on the host.
If a queue is blocking, it is not necessary to do alpaka::wait(queue)
.
However, calling alpaka::wait(queue)
should be cheap, though not completely free.
So maybe we can keep it anyway ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I used wait
only after memcpy
statements and added no comments... Although the queue
is blocking, as you said it is not expensive.
I like the idea of having examples with |
By "Explicitly" you mean while passing to the kernel and instead of
|
No, I only meant that instead of template<typename TAcc, typename MdSpan>
ALPAKA_FN_ACC void operator()(TAcc const& acc, MdSpan A, MdSpan B, MdSpan C) const { ... } I would prefer something like template<typename TAcc,
typename T,
typename Extents,
typename LayoutPolicy,
typename AccessorPolicy>
ALPAKA_FN_ACC void operator()(TAcc const& acc,
mdspan<T, Extents, LayoutPolicy, AccessorPolicy> A,
mdspan<T, Extents, LayoutPolicy, AccessorPolicy> B,
mdspan<T, Extents, LayoutPolicy, AccessorPolicy> C) const
{ ... } However, I acknowledge that this is a lot more verbose, and maybe easier to write once we enable c++20 and concepts. |
b49a304
to
f874872
Compare
|
||
//! Specialization for mdspan with four template arguments | ||
template<typename ElementType, typename Extents, typename LayoutPolicy, typename AccessorPolicy> | ||
struct isMdspan<std::experimental::mdspan<ElementType, Extents, LayoutPolicy, AccessorPolicy>> : std::true_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename this to IsMdspan
?
and add
template <typename T>
inline constexpr bool is_mdspan = IsMdspan<T>::value;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks.
Your suggestion is good and much expressive, but for simplicity i pass TMdSpan then check whether it is exactly std::experimental::mdspan.
|
f874872
to
5c2a4f1
Compare
Good idea, that's a good compromise. |
cdda967
to
e80097e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rebase to use getValidWorkDivKErnel
auto mdDevC = alpaka::experimental::getMdSpan(bufDevC); | ||
|
||
// Let alpaka calculate good block and grid sizes given our full problem extent. | ||
auto const workDiv = alpaka::getValidWorkDiv<Acc>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use getValidWorkDivKernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased, done, thanks.
d1d1c95
to
aa6783e
Compare
offline discussed with @mehmetyusufoglu : we will remove the add example because it is 99.9% equal to the multiplication and IMO the code duplication is not required. |
aa6783e
to
0f68e49
Compare
0f68e49
to
74d8315
Compare
693d146
to
a3d1443
Compare
|
||
std::cout << "Multiplication of matrices of size " << M << "x" << K << " and " << K << "x" << N << " using mdspan " | ||
<< (success ? "succeeded" : "failed") << "!" << std::endl; | ||
return EXIT_SUCCESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if success
is false you should end with EXIT_FAILURE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added. Thanks a lot. (Tested by ctest
for both true-positive, true-negative cases. ) Other examples like convolution1d,2d,2dwithMdSpan are correct in this sense.
a3d1443
to
bb1f4e6
Compare
bb1f4e6
to
ed330b1
Compare
This matrix multiplication example is using
mdspan
data structure for passing the data to/from kernel and for filling the data in the host conveniently.By using mdspan passing data size and pitch values to the kernel for each 2d array is not needed. Filling the host data is not difficult as well.
If these workshop examples are found to be useful for the users of Alpaka in general; this PR can be merged as well.