GPU Programming#

Overview#

Scientific applications that use GPUs for general purpose computing usually employ one or more of the following programming models:

Offloading via OpenACC- or OpenMP- directives
Direct kernel launching via CUDA C/C++ or Fortran
Asynchronous kernel launching via OpenCL

These and other approaches are described and taught in various tutorials and in-depth documentation, e.g.:

ICON-specific documentation for GPU programming can be found at https://gitlab.dkrz.de/icon/wiki/-/wikis/GPU-development (only accessible to ICON developers).

Python packages are not by default GPU-ready, a brief introduction how to use python on levante’s GPU nodes can be found here.

Compiling and Linking#

To build GPU-adapted programs on Levante, you can use the NVIDIA HPC SDK compilers for C/C++ and Fortran. These compilers support building of GPU ready codes using CUDA, OpenACC, OpenCL and OpenMP.

To make these available in your shell environment, you need to load an nvhpc environment module, e.g.:

$ module load nvhpc/22.5-gcc-11.2.0

The compiler names are nvc, nvc++, and nvfortran for C, C++ and Fortran programs respectively. nvcc is used for CUDA C/C++ programs.

The table below lists a number of relevant compiler options, including recommendations for those parts of the code not usually running on the GPU like I/O routines.

Option	Descripion
`-mp`	Generates multi-threaded or accelerated code based on the OpenMP directives
`-acc`	Enables OpenACC pragmas and directives
`-target=gpu`	Select the NVIDIA GPU target for parallel programming paradigms.
`-Minfo=accel`	Display information about regions for GPU acceleration inferred by the compiler.
`-tp=zen3`	Indicates the processor for which code is generated.
`-Kieee, -Knoieee`	Perform floating-point operations in strict conformance with the IEEE 754 standard. Some optimizations are disabled with -Kieee, and a more accurate math library is used. The default, `-Knoieee`, uses faster but potentially non-conformant methods.
`-Mfma`	Allow usage of fused multiply-add (FMA) instructions. The alternative, `-Mnofma` enforces separate multiply and add instructions. `-O3` enables FMA by default.

Comprehensive documentation on using NVIDIA HPC compilers and program development tools can be found in the vendor documentation .

To build GPU-enabled and MPI-parallel programs you additionally need to load the following OpenMPI module:

$ module load openmpi/.4.1.4-nvhpc-22.5

You can check that OpenMPI has been built with GPU support (via CUDA) by querying ompi_info:

$ ompi_info -c |grep 'MPI extensions'
      MPI extensions: affinity, cuda, pcollreq

cuda must then be listed among the extensions.

The recommended method to build MPI programs is to use the wrappers, i.e. to use mpicc instead of nvc, mpic++ instead of nvc++ and mpifort instead of nvfortran.