diff --git a/README.md b/README.md index 29cbc479..430e4ef1 100644 --- a/README.md +++ b/README.md @@ -217,15 +217,18 @@ there are following options: Note that only `ecwam-run-model` currently supports MPI. + Running with source-term computation offloaded to the GPU ========================================================= -The calculation of the source-terms in ecWam, i.e. the physics, can be offloaded for GPU execution. GPU optimised code is -generated at build-time using ECMWF's source-to-source translation toolchain Loki. Currently, two Loki transformations are supported -(in ascending order of performance): +The calculation of the source-terms in ecWam, i.e. the physics, can be offloaded for GPU execution. +GPU optimised code is generated at build-time using ECMWF's source-to-source translation toolchain Loki. Currently, +three Loki transformations are supported: - Single-column-coalesced (scc): Fuse vector loops and promote to the outermost level to target the SIMT execution model -- scc-stack: The scc transformation with a pool allocator used to allocate temporary arrays (the default) +- scc-hoist: The scc transformation with temporary arrays hoisted to the driver-layer +- scc-stack: The scc transformation with a pool allocator used to allocate temporary arrays -Currently, only the OpenACC programming model is supported. +The scc-hoist and scc-stack transformations offer superior performance to the scc transformation. Currently, only the +OpenACC programming model on Nvidia GPUs is supported. Building -------- diff --git a/src/ecwam/airsea.F90 b/src/ecwam/airsea.F90 index 42e0798a..d5e11507 100644 --- a/src/ecwam/airsea.F90 +++ b/src/ecwam/airsea.F90 @@ -58,6 +58,7 @@ SUBROUTINE AIRSEA (KIJS, KIJL, & USE PARKIND_WAVE, ONLY : JWIM, JWRB, JWRU + USE YOWFRED , ONLY : NWAV_GC ! needed for Loki USE YOWPARAM, ONLY : NANG ,NFRE USE YOWPHYS, ONLY : XKAPPA, XNLEV USE YOWTEST, ONLY : IU06 diff --git a/src/ecwam/implsch.F90 b/src/ecwam/implsch.F90 index 4a3515c1..c88f9c23 100644 --- a/src/ecwam/implsch.F90 +++ b/src/ecwam/implsch.F90 @@ -88,6 +88,8 @@ SUBROUTINE IMPLSCH (KIJS, KIJL, FL1, & USE YOWPCONS , ONLY : WSEMEAN_MIN, ROWATERM1 USE YOWSTAT , ONLY : IDELT ,LBIWBK USE YOWWNDG , ONLY : ICODE ,ICODE_CPL + USE YOWINDN , ONLY : MLSTHG ! needed for Loki + USE YOWFRED , ONLY : NWAV_GC ! needed for Loki USE YOMHOOK , ONLY : LHOOK, DR_HOOK, JPHOOK diff --git a/src/ecwam/sinflx.F90 b/src/ecwam/sinflx.F90 index 2ebe4fc2..215f30f3 100644 --- a/src/ecwam/sinflx.F90 +++ b/src/ecwam/sinflx.F90 @@ -29,6 +29,7 @@ SUBROUTINE SINFLX (ICALL, NCALL, KIJS, KIJL, & USE PARKIND_WAVE, ONLY : JWIM, JWRB, JWRU + USE YOWFRED , ONLY : NWAV_GC ! needed for Loki USE YOWCOUP , ONLY : LWCOU ,LLCAPCHNK , LLGCBZ0, LLNORMAGAM USE YOWPARAM , ONLY : NANG ,NFRE USE YOWPHYS , ONLY : DTHRN_A ,DTHRN_U diff --git a/src/ecwam/taut_z0.F90 b/src/ecwam/taut_z0.F90 index 7b884081..3fc5b44c 100644 --- a/src/ecwam/taut_z0.F90 +++ b/src/ecwam/taut_z0.F90 @@ -68,6 +68,7 @@ SUBROUTINE TAUT_Z0(KIJS, KIJL, IUSFG, & USE PARKIND_WAVE, ONLY : JWIM, JWRB, JWRU + USE YOWFRED , ONLY : NWAV_GC ! needed for Loki USE YOWCOUP , ONLY : LLCAPCHNK, LLGCBZ0 USE YOWPARAM , ONLY : NANG ,NFRE USE YOWPCONS , ONLY : G, GM1, EPSUS, EPSMIN, ACD, BCD, CDMAX