Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half matrix and components #1708

Merged
merged 16 commits into from
Dec 3, 2024
Merged

Half matrix and components #1708

merged 16 commits into from
Dec 3, 2024

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Oct 24, 2024

This PR adds the matrix and components (like arrary/device_matrix_data) with half precision support

Also, to avoid touch the files, which are not related to this PR, I add several type list with half additionally.
For example, RealValueTypes -> RealValueTypesWithHalf and next_precision -> next_precision_with_half
(We will add bfloat16 in the future, so maybe do not use Half)

for the friend and corresponding function

friend class <prev<value_type>> // class previous precision can access this
function(class <next<value_type>>) // it can access the class with next precision because it is class with previous precision of class with next precision.

If we only use next in friend and function

friend class <next<next<value_type>>>
function(class <next<value_type>>)

Moreover, the second one does not work when we fallback the next_precision_with_half to next_precision by disabling half because next<next<value_type>> is value_type without half now. However, the first one always work.

TODO:

@yhmtsai yhmtsai requested review from a team October 24, 2024 15:30
@yhmtsai yhmtsai self-assigned this Oct 24, 2024
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers type:matrix-format This is related to the Matrix formats type:factorization This is related to the Factorizations type:reordering This is related to the matrix(LinOp) reordering mod:all This touches all Ginkgo modules. labels Oct 24, 2024
@yhmtsai yhmtsai mentioned this pull request Oct 24, 2024
12 tasks
@yhmtsai yhmtsai force-pushed the half_matrix branch 2 times, most recently from 8f3a17d to b7d4a15 Compare October 25, 2024 08:20
@yhmtsai yhmtsai added this to the Ginkgo 1.9.0 milestone Oct 25, 2024
@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Oct 25, 2024
@MarcelKoch MarcelKoch self-requested a review October 25, 2024 15:54
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. There are still some places where the new half-enabled types are missing.

include/ginkgo/core/base/segmented_array.hpp Show resolved Hide resolved
include/ginkgo/core/base/precision_dispatch.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/matrix/row_gatherer.hpp Outdated Show resolved Hide resolved
using ComplexAndPODTypes = merge_type_list_t<ComplexValueTypes, PODTypes>;

using ComplexAndPODTypesWithHalf =
merge_type_list_t<ComplexValueTypesWithHalf, PODTypes>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still instances, where the type list without half is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get it. Could you elaborate it more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you search for ComplexValueTypes you fill find it still in use in places where the version with half should be used instead, e.g. the reference array tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PermuteIterator test in core/test/base/iterator_factory.cpp still uses ComplexAndPODTypes without half. Is that intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Catch. I also change all ComplexAndPODTypes with half now.

core/test/utils.hpp Show resolved Hide resolved
hip/components/cooperative_groups.hip.hpp Outdated Show resolved Hide resolved
static_assert(sizeof(ValueType) == sizeof(ResultType),
"The type to reinterpret to must be of the same size as the "
"original type.");
return reinterpret_cast<ResultType&>(val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just use memcpy here directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer if you used memcpy instead (because it is defined behavior):

Suggested change
return reinterpret_cast<ResultType&>(val);
ResultType res;
using std::memcpy;
memcpy(&res, &val, sizeof(ValueType));
return res;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think this should be changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it but I do not think it help anything but we still need the same undefined behavior outside to make it work.

omp/components/atomic.hpp Show resolved Hide resolved
reference/matrix/ell_kernels.cpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Show resolved Hide resolved
Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 1 / 2 of my review. So far, I only have small comments.

option(GINKGO_ENABLE_HALF "Enable the use of half precision" ON)
# We do not support MSVC. SYCL will come later
if(MSVC OR GINKGO_BUILD_SYCL)
message(STATUS "HALF is not supported in MSVC, and later support in SYCL")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be rephrased since I really don't know what you mean by "and later support in SYCL".
Do you mean that SYCL does support half-precision in a later version?

Copy link
Member Author

@yhmtsai yhmtsai Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will enable the support from #1710
As the half is trivial copy again now, we might not need the device_type mapping though.

common/cuda_hip/base/math.hpp Show resolved Hide resolved
common/cuda_hip/base/math.hpp Show resolved Hide resolved
common/cuda_hip/base/math.hpp Outdated Show resolved Hide resolved
Comment on lines 119 to 120
// It is required by NVHPC 23.3, isnan is undefined when NVHPC are only as host
// compiler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get the meaning, maybe:

Suggested change
// It is required by NVHPC 23.3, isnan is undefined when NVHPC are only as host
// compiler.
// It is required by NVHPC 23.3, `isnan` is undefined when NVHPC is only used as a host
// compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly,
I think cuda will go through the code twice. one for device and the other for the rest.
NVCC does not complain anything, but NVHPC will complain isnan is not defined.
TBH, I forgot whether I put __device__ or not when I encounter this issue.
I will check again

common/cuda_hip/base/types.hpp Show resolved Hide resolved
exec,
[] GKO_KERNEL(auto idx, auto array) {
if constexpr (std::is_same_v<remove_complex<ValueType>, half>) {
// __half can not be from int64_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by that?
That half can't be converted to int64_t?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, __half can not be converted from int64_t.
cuda only writes the conversion from short, int, long long and the corresponding unsigned version.
Unfortuntately, it does not accepts int64_t even if long long and int64_t are the same technically.

@yhmtsai yhmtsai force-pushed the half_type branch 2 times, most recently from 3b05fc9 to b5afcac Compare November 30, 2024 01:30
@yhmtsai yhmtsai force-pushed the half_matrix branch 2 times, most recently from 2aa59e2 to 70a8f26 Compare November 30, 2024 18:36
@yhmtsai yhmtsai requested a review from MarcelKoch December 1, 2024 00:41
@yhmtsai yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. 1:ST:skip-full-test and removed 1:ST:ready-for-review This PR is ready for review labels Dec 3, 2024
Base automatically changed from half_type to develop December 3, 2024 01:08
@yhmtsai yhmtsai merged commit 76ef161 into develop Dec 3, 2024
9 of 11 checks passed
@yhmtsai yhmtsai deleted the half_matrix branch December 3, 2024 01:11
@ginkgo-bot
Copy link
Member

Error: PR already merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. 1:ST:skip-full-test mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:testing This is related to testing. type:factorization This is related to the Factorizations type:matrix-format This is related to the Matrix formats type:reordering This is related to the matrix(LinOp) reordering type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants