Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transposer kernels for reordering field data. #29
Add transposer kernels for reordering field data. #29
Changes from 3 commits
4fb4845
6f5b6fa
ac5e306
1e920eb
2f22c9f
5f5f129
33ef903
932df13
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRT comment that "select type stuff is annoying" - I agree. One possible way to make it slightly better would be
and then the code in the main subroutine should be
which is perhaps a little better.
The other option would be to rewrite this as a wrapper subroutine accepting
class(field_t)
and then it calls the actual subroutine withtype(cuda_field_t)
arguments. The wrapper subroutine would need to do the select type dance (and should probably useclass default
for some error checking), but the actual implementation would be free if this complication which may or may not be an improvement.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need something like this for sure, but was planning to deal with it later on when fixing the allocator. The issue is mentioned here #24. And I think we can pass the size we need into this
set_data_pointer
function and handle the bounds remapping there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, better not to clutter the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I feel like these can be refactored into a single subroutine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can merge all these kernels into one while making sure the performance is good. This is specific to CUDA backend tough, on the OpenMP backend I think it is possible. What you have in CUDA kernels is like the bit of code you have inside some number of nested loops, and any conditional checks over there can make performance bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes ok, that's the part I wasn't sure about.