You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Memory allocations allocation per thread ( firstprivate) vs shared variables ( may live in shared memory/global memory/local memory vs being passed as a kernel argument )
Pinned memory allocation on CPU
shared memory allocations
Concurrency
submit kernels from multiple threads (Streams with openmp #3). Demonstrates using different cudaStreams/hipStreams.
use openmp asynchronously ( stretch ) (asynchronous offload #7). Demonstrates overlapping memory transfers and execution or multiple small kernels which might be difficult to merge in one bigger kernel.
This is meant to be mostly an overview of what can be done using advanced features, more than going into the details.
No base material available.
The text was updated successfully, but these errors were encountered: