Add transposer kernels for reordering field data. #29

semi-h · 2024-01-24T18:08:16Z

We need to reorder field data as we switch between x, y, and z orientations.

First commit adds transposing operation from x to y.

Select type stuff we have to do here is a bit annoying, and it is mainly due to the extention of the base field type in CUDA backend to support device arrays in Fortran. Any suggestion here is welcome @JamieJQuinn.

Nanoseb

Sorry I can't really comment on the core of the code, but just had a couple of remarks about code duplication. Feel free to ignore them though.

One last thing, that's the kind of feature for which an unit test make a lot of sense and will be fairly easy. Just have some data, transpose it several times and back to its original position and test if it is still the same as before.

src/cuda/backend.f90

Nanoseb · 2024-01-29T16:20:17Z

src/cuda/kernels_trans.f90

+
+contains
+
+   attributes(global) subroutine trans_x2y_k(u_y, u_x, nz)


Similarly, I feel like these can be refactored into a single subroutine

I'm not sure if we can merge all these kernels into one while making sure the performance is good. This is specific to CUDA backend tough, on the OpenMP backend I think it is possible. What you have in CUDA kernels is like the bit of code you have inside some number of nested loops, and any conditional checks over there can make performance bad.

yes ok, that's the part I wasn't sure about.

pbartholomew08 · 2024-01-30T12:19:11Z

src/cuda/backend.f90

+      real(dp), device, pointer, dimension(:, :, :) :: u_d, u_y_d
+      type(dim3) :: blocks, threads
+
+      select type(u); type is (cuda_field_t); u_d => u%data_d; end select


WRT comment that "select type stuff is annoying" - I agree. One possible way to make it slightly better would be

subroutine set_data_pointer(u, u_d) class(field_t), intent(in) :: u real(dp), device, pointer, dimension(:, :, :), intent(out) :: u_d select type(u) type is (cuda_field_t) u_d => u%data_d end select end subroutine

and then the code in the main subroutine should be

call set_data_pointer(u, u_d) call set_data_pointer(u_y, u_y_d)

which is perhaps a little better.

The other option would be to rewrite this as a wrapper subroutine accepting class(field_t) and then it calls the actual subroutine with type(cuda_field_t) arguments. The wrapper subroutine would need to do the select type dance (and should probably use class default for some error checking), but the actual implementation would be free if this complication which may or may not be an improvement.

I think we need something like this for sure, but was planning to deal with it later on when fixing the allocator. The issue is mentioned here #24. And I think we can pass the size we need into this set_data_pointer function and handle the bounds remapping there.

I agree, better not to clutter the change.

semi-h · 2024-01-30T17:06:27Z

Refactored all the transpose functions into one as we agreed, but kept the cuda kernels as they are due to performance concerns.

Also, renamed transpose and trans into reorder. The name tranpose stuck from the original Xcompact3D implementation, but the operation we're carrying out is not any more a transpose I believe. It is always local and just reorders data into x, y, or z directions so that we can use our 1D tridiagonal solver. What do you think about this? I added all the renaming stuff in a single commit and easily convert back if people disagree.

pbartholomew08 · 2024-01-30T17:15:07Z

Refactored all the transpose functions into one as we agreed, but kept the cuda kernels as they are due to performance concerns.

Also, renamed transpose and trans into reorder. The name tranpose stuck from the original Xcompact3D implementation, but the operation we're carrying out is not any more a transpose I believe. It is always local and just reorders data into x, y, or z directions so that we can use our 1D tridiagonal solver. What do you think about this? I added all the renaming stuff in a single commit and easily convert back if people disagree.

reorder seems sensible to me, alternatives (only if you're unhappy with it) could be "orient" or "rotate", but I see no need to change from reorder.

Nanoseb

This looks good to me. And same, happy with reorder for the terminology.

src/common.f90

semi-h · 2024-02-05T17:36:46Z

Added a simple test, please let me know if there is anything you want to add. I plan to merge tomorrow.

Nanoseb · 2024-02-05T17:49:02Z

Looking good to me. I think in the future we could move checkperf into a separate test module because it is used in other tests too. But this is outside the scope of this PR.

JamieJQuinn

Minor changes except possible logging system but that should go in its own PR anyway. Whole reordering processes seems much cleaner now.

JamieJQuinn · 2024-02-06T12:18:10Z

src/cuda/backend.f90

+         call reorder_z2y<<<blocks, threads>>>(u_o_d, u_i_d, &
+                                               self%nx_loc, self%nz_loc)
+      case default
+         print *, 'Transpose direction is undefined.'


Would be useful to report errors to an appropriate output stream, e.g. stderr via something like https://stackoverflow.com/a/8508757. Maybe we want to do this with a global logging system though. Thoughts?

yes please for a more appropriate output stream!

We make use of this in the tests already. Changed the print here accordingly.

Could you give more details about a global logging system you have in mind? I haven't really used something like that. Maybe we can start an issue on this for further discussions.

src/cuda/kernels_reorder.f90

JamieJQuinn · 2024-02-06T12:26:33Z

tests/cuda/test_cuda_reorder.f90

@@ -0,0 +1,228 @@
+program test_cuda_reorder


These tests are great but the kernels are being directly tested and the higher level reorder_cuda doesn't get tested. Might be worth adding a few tests for that specifically.

I think I'll keep it as is for now, but I'll be happy to have a chat later about the best practices regarding testing. I'm tempted to test a single functionality only and ideally isolating it from the rest, and reorder_cuda would require instantiating the backend and the allocator at least, and also setting the class variables correctly so that the reorder subroutines are run currectly. The way we set these class variables are still not finalised, and currently done in the main program. Plan is to move them into a dedicated function in the backend, which reads the inputs from a config file.

JamieJQuinn · 2024-02-06T12:36:23Z

Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g. trans_x2y) into the switch-style reorder(u_y, u, RDR_X2Y)?

To me, the calling code remains semantically identical so there's no reduction in complexity there, and there has to be a separate kernel for every direction anyway, so we haven't reduced complexity on the backend. I feel like I'm missing the advantage of introducing this new switch statement.

Nanoseb · 2024-02-06T13:16:36Z

Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g. trans_x2y) into the switch-style reorder(u_y, u, RDR_X2Y)?

To me reorder(... RDR_X2Y) feels cleaner because you have a single subroutine to do the work, you don't have to setup 6 different ones, all having the same set of input and output definitions that is error prone. When reading the code it felt simpler that way to be more concise.

semi-h · 2024-02-06T13:32:07Z

Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g. trans_x2y) into the switch-style reorder(u_y, u, RDR_X2Y)?

It helped us to remove some of the backend procedures and we ended up having fewer lines.
Also, we need separate kernels in the CUDA backend but in the OpenMP backend we can actually have a single subrotuine doing all the reorderings based on the switch parameter.

semi-h added 3 commits January 29, 2024 10:43

refactor: Make all transpose operations work on a single field.

4fb4845

feat(cuda): Add transpose kernels.

6f5b6fa

feat(cuda): Enable CUDA backend to call transpose kernels.

ac5e306

semi-h force-pushed the feature branch from 1aa737c to ac5e306 Compare January 29, 2024 10:57

semi-h requested a review from Nanoseb January 29, 2024 10:58

semi-h changed the title ~~[WIP] Add transposer kernels for reordering field data.~~ Add transposer kernels for reordering field data. Jan 29, 2024

Nanoseb reviewed Jan 29, 2024

View reviewed changes

pbartholomew08 reviewed Jan 30, 2024

View reviewed changes

semi-h added 2 commits January 30, 2024 16:39

refactor: Merge all transpose functions.

1e920eb

feat: Use an integer parameter for transpose direction.

2f22c9f

refactor: Rename trans/transpose -> reorder.

5f5f129

semi-h force-pushed the feature branch from 1cad1a4 to 5f5f129 Compare January 30, 2024 17:12

Nanoseb approved these changes Jan 30, 2024

View reviewed changes

src/common.f90 Show resolved Hide resolved

feat(cuda/tests): Add tests for all reordering kernels.

33ef903

JamieJQuinn requested changes Feb 6, 2024

View reviewed changes

fix: Minor changes.

932df13

semi-h merged commit 18fe0bd into xcompact3d:main Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add transposer kernels for reordering field data. #29

Add transposer kernels for reordering field data. #29

semi-h commented Jan 24, 2024

Nanoseb left a comment

Nanoseb Jan 29, 2024

semi-h Jan 30, 2024

Nanoseb Jan 30, 2024

pbartholomew08 Jan 30, 2024

semi-h Jan 30, 2024

pbartholomew08 Jan 30, 2024

semi-h commented Jan 30, 2024

pbartholomew08 commented Jan 30, 2024 •

edited

Loading

Nanoseb left a comment

semi-h commented Feb 5, 2024

Nanoseb commented Feb 5, 2024

JamieJQuinn left a comment

JamieJQuinn Feb 6, 2024

slaizet Feb 6, 2024

semi-h Feb 6, 2024

Nanoseb Feb 6, 2024

JamieJQuinn Feb 6, 2024

semi-h Feb 6, 2024

JamieJQuinn commented Feb 6, 2024

Nanoseb commented Feb 6, 2024

semi-h commented Feb 6, 2024


		contains

		attributes(global) subroutine trans_x2y_k(u_y, u_x, nz)

Add transposer kernels for reordering field data. #29

Add transposer kernels for reordering field data. #29

Conversation

semi-h commented Jan 24, 2024

Nanoseb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

semi-h commented Jan 30, 2024

pbartholomew08 commented Jan 30, 2024 • edited Loading

Nanoseb left a comment

Choose a reason for hiding this comment

semi-h commented Feb 5, 2024

Nanoseb commented Feb 5, 2024

JamieJQuinn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamieJQuinn commented Feb 6, 2024

Nanoseb commented Feb 6, 2024

semi-h commented Feb 6, 2024

pbartholomew08 commented Jan 30, 2024 •

edited

Loading