Advanced Topics #20

lucaparisi91 · 2024-08-15T12:50:02Z

This is meant to be mostly an overview of what can be done using advanced features, more than going into the details.
No base material available.

Custom mappers ( Create mappers for data structures #1 )
- mapping of a class containing pointers
Custom memory allocators ( openmp allocators #4 )
- Memory allocations allocation per thread ( firstprivate) vs shared variables ( may live in shared memory/global memory/local memory vs being passed as a kernel argument )
- Pinned memory allocation on CPU
- shared memory allocations
Concurrency
- submit kernels from multiple threads (Streams with openmp #3). Demonstrates using different cudaStreams/hipStreams.
- use openmp asynchronously ( stretch ) (asynchronous offload #7). Demonstrates overlapping memory transfers and execution or multiple small kernels which might be difficult to merge in one bigger kernel.
Interoperability ( Calling cuda/rocm libraries from openmp #2 )
- Dementrate how to use cuda with a variable mapped from openmp and how to use a variable allocated from cuda in openmp.
- example of using cuFFT ( or any cuda/rocm numerical library ) together with openmp

lucaparisi91 · 2024-11-27T18:00:16Z

Most of the content is there. I need to add the concurrency topics to the slides.

lucaparisi91 added the lesson label Aug 15, 2024

lucaparisi91 self-assigned this Aug 15, 2024

lucaparisi91 mentioned this issue Aug 15, 2024

Introduction to OpenMP offload #19

Open

lucaparisi91 mentioned this issue Nov 28, 2024

Add material on different type of memories #35

Closed

3 tasks

Provide feedback