Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Topics #20

Open
lucaparisi91 opened this issue Aug 15, 2024 · 1 comment
Open

Advanced Topics #20

lucaparisi91 opened this issue Aug 15, 2024 · 1 comment
Assignees
Labels

Comments

@lucaparisi91
Copy link
Collaborator

lucaparisi91 commented Aug 15, 2024

This is meant to be mostly an overview of what can be done using advanced features, more than going into the details.
No base material available.

  • Custom mappers ( Create mappers for data structures #1 )
    • mapping of a class containing pointers
  • Custom memory allocators ( openmp allocators #4 )
    • Memory allocations allocation per thread ( firstprivate) vs shared variables ( may live in shared memory/global memory/local memory vs being passed as a kernel argument )
    • Pinned memory allocation on CPU
    • shared memory allocations
  • Concurrency
    • submit kernels from multiple threads (Streams with openmp #3). Demonstrates using different cudaStreams/hipStreams.
    • use openmp asynchronously ( stretch ) (asynchronous offload #7). Demonstrates overlapping memory transfers and execution or multiple small kernels which might be difficult to merge in one bigger kernel.
  • Interoperability ( Calling cuda/rocm libraries from openmp #2 )
    • Dementrate how to use cuda with a variable mapped from openmp and how to use a variable allocated from cuda in openmp.
    • example of using cuFFT ( or any cuda/rocm numerical library ) together with openmp
@lucaparisi91
Copy link
Collaborator Author

Most of the content is there. I need to add the concurrency topics to the slides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant