Skip to content

dluo96/gpu-programming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU Programming

This repository is a collection of notes, diagrams, and kernels that I am compiling (no pun intended!) to better understand GPU programming. To that end, I focus mainly on implementing GPU kernels in CUDA C and Triton.

Notes

  • Introduction to GPU compute for CUDA-capable GPUs. Covers parallel computing terms including kernels, streaming multiprocessors (SMs), CUDA cores, threads, warps, thread blocks, grids.
  • Introduction to GPU memory. Covers concepts including registers, L1 cache, L2 cache, shared memory, global memory, memory clock rate, memory bus width, peak memory bandwidth.

CUDA C kernels

Other programs

  • Program that extracts the properties of the attached CUDA device(s).
  • CUDA Streams. See here and here and here.

Setup

To run the CUDA scripts in this repo, you will need to be set up with a host machine that has a CUDA-enabled GPU and nvcc installed.

Usage

In general, you can compile and execute a CUDA source file as follows:

nvcc /path/to/source.cu -o /path/to/executable -run

For example, you can run the "Hello, World!" kernel using:

nvcc src/hello_world.cu -o hello_world -run

Note that .cu is the required file extension for CUDA-accelerated programs. See the Makefile for a more complete list of commands you can run.

Device query

To query the amount of resources available for your device, run:

nvcc src/device_info.cu -o device_info -run

References

About

GPUs, CUDA C, and Triton

Resources

License

Stars

Watchers

Forks