Skip to content

Commit

Permalink
Added GPU offload instructions to the README
Browse files Browse the repository at this point in the history
  • Loading branch information
awnawab committed Mar 25, 2024
1 parent e7d18af commit f7ac931
Showing 1 changed file with 19 additions and 19 deletions.
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Further optional dependencies:
- multio (see https://github.com/ecmwf/multio)
- ocean model (e.g. NEMO or FESOM)
- fypp (see https://github.com/aradi/fypp)
- loki (see https://github.com/ecmwf-ifs/loki)

Some driver scripts to run tests and validate results rely on availability of:
- md5sum (part of GNU Coreutils; on MacOS, install with `brew install coreutils`)
Expand Down Expand Up @@ -218,36 +219,35 @@ Note that only `ecwam-run-model` currently supports MPI.

Running with source-term computation offloaded to the GPU
=========================================================
The calculation of the source-terms in ecWam, i.e. the physics, can be offloaded for GPU execution. GPU optimised code is
generated at build-time using ECMWF's source-to-source translation toolchain Loki. Currently, two Loki transformations are supported
(in ascending order of performance):
- Single-column-coalesced (scc): Fuse vector loops and promote to the outermost level to target the SIMT execution model
- scc-stack: The scc transformation with a pool allocator used to allocate temporary arrays (the default)

ecWam can be run with the source-term computation offloaded to the GPU. Please note that this is under active development
and will change frequently.

Single-node multi-GPU runs are also supported.
Currently, only the OpenACC programming model is supported.

Building
--------
The recommended option for building the GPU enabled variants is to use the provided bundle, and pass the `--with-loki --with-acc`
options. Different Loki transformations can also be chosen at build-time via the following bundle option: `--loki-mode=<trafo>`.

The ecwam-bundle also provides appropriate arch files for the nvhpc suite on the ECMWF ATOS system.

The [ecwam-bundle](https://git.ecmwf.int/users/nawd/repos/ecwam-bundle/browse?at=refs%2Fheads%2Fnaan-phys-gpu) is the recommended build option
for the ecWam GPU enabled variant. The option `--with-phys-gpu` has to be specified at the build step. Arch files are provided for the nvhpc
suite on the ECMWF ATOS system.
Running
-------
No extra run-time options are needed to run the GPU enabled ecWam. Please note that this means that if ecWam is built with
`--with-loki` and `--with-acc` bundle arguments, the source-term computation will necessarily be offloaded for GPU execution.
For multi-GPU runs, the number of GPUs maps to the number of MPI ranks. Thus multiple GPUs can be requested by launching
with multiple MPI ranks. The mapping of MPI ranks to GPUs assumes at most 4 GPUs per host node.

Environment variables
---------------------

In its current guise, the CUDA runtime is used to manage temporary arrays and needs a large `NV_ACC_CUDA_HEAPSIZE`, e.g.
The loki-scc variant uses the CUDA runtime to manage temporary arrays and needs a large `NV_ACC_CUDA_HEAPSIZE`, e.g.
`NV_ACC_CUDA_HEAPSIZE=8G`.

Currently, the nvhpc compiler suite cannot be used with the hpcx-openmpi suite and must instead use the version of openmpi
bundled within. It's location is specified via the `MPI_HOME` environment variable at build-time. At run-time, we must
specify the location of the `mpirun` executable manually, even if running with one process. This can be done via either of
the following two options:
```
export LAUNCH="$MPI_HOME/bin/mpirun -np 1"
export LAUNCH="srun -n 1"
```

Please note that `env.sh` must be sourced to set `MPI_HOME`. For running with multiple OpenMP threads and grids finer than `O48`,
`OMP_STACKSIZE` should be set to at least `256M`.
For running with multiple OpenMP threads and grids finer than `O48`, `OMP_STACKSIZE` should be set to at least `256M`.

Known issues
============
Expand Down

0 comments on commit f7ac931

Please sign in to comment.