Skip to content

Commit

Permalink
Update README for wave propagation offload
Browse files Browse the repository at this point in the history
  • Loading branch information
awnawab committed Mar 28, 2024
1 parent efed84d commit aefac38
Showing 1 changed file with 15 additions and 12 deletions.
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,31 +218,33 @@ there are following options:
Note that only `ecwam-run-model` currently supports MPI.


Running with source-term computation offloaded to the GPU
=========================================================
The calculation of the source-terms in ecWam, i.e. the physics, can be offloaded for GPU execution.
GPU optimised code is generated at build-time using ECMWF's source-to-source translation toolchain Loki. Currently,
three Loki transformations are supported:
GPU offload
===========
ecWAM can be offloaded for GPU execution. GPU optimised code for the wave propagation kernel is commited to source,
whereas GPU code for the source-term computation is generated at build-time build-time using ECMWF's source-to-source
translation toolchain Loki. Currently, three Loki transformations are supported:
- Single-column-coalesced (scc): Fuse vector loops and promote to the outermost level to target the SIMT execution model
- scc-hoist: The scc transformation with temporary arrays hoisted to the driver-layer
- scc-stack: The scc transformation with a pool allocator used to allocate temporary arrays
- scc-stack: The scc transformation with a pool allocator used to allocate temporary arrays (the default)

The scc-hoist and scc-stack transformations offer superior performance to the scc transformation. Currently, only the
OpenACC programming model on Nvidia GPUs is supported.

Building
--------
The recommended option for building the GPU enabled variants is to use the provided bundle, and pass the `--with-loki --with-acc`
options. Different Loki transformations can also be chosen at build-time via the following bundle option: `--loki-mode=<trafo>`.
The recommended option for building the GPU enabled ecWAM is to use the provided bundle, and pass the
`--with-loki --with-acc` options. Different Loki transformations can also be chosen at build-time via the following
bundle option: `--loki-mode=<trafo>`. Direct GPU-to-GPU MPI communications can be enabled by passing the
`--with-gpu-aware-mpi` option.

The ecwam-bundle also provides appropriate arch files for the nvhpc suite on the ECMWF ATOS system.

Running
-------
No extra run-time options are needed to run the GPU enabled ecWam. Please note that this means that if ecWam is built with
`--with-loki` and `--with-acc` bundle arguments, the source-term computation will necessarily be offloaded for GPU execution.
For multi-GPU runs, the number of GPUs maps to the number of MPI ranks. Thus multiple GPUs can be requested by launching
with multiple MPI ranks. The mapping of MPI ranks to GPUs assumes at most 4 GPUs per host node.
No extra run-time options are needed to run the GPU enabled ecWam. Please note that this means that if ecWam is built
using the `--with-loki` and `--with-acc` bundle arguments, it will necessarily be offloaded for GPU execution.
For multi-GPU runs, the number of GPUs maps to the number of MPI ranks. Thus multiple GPUs can be requested by
launching with multiple MPI ranks. The mapping of MPI ranks to GPUs assumes at most 4 GPUs per host node.

Environment variables
---------------------
Expand All @@ -261,6 +263,7 @@ Known issues
a floating point exception during during call to `MPI_INIT`.
The flag `-ffpe-trap=overflow` is set e.g. for `Debug` build type.
Floating point exceptions on arm64 manifest as a `SIGILL`.
2) The coarsest configuration, i.e. `O48`, should be run with no more than one GPU.

Reporting Bugs
==============
Expand Down

0 comments on commit aefac38

Please sign in to comment.