GPU Programming#
Overview#
Scientific applications that use GPUs for general purpose computing usually employ one or more of the following programming models:
These and other approaches are described and taught in various tutorials and in-depth documentation, e.g.:
ICON-specific documentation for GPU programming can be found at https://gitlab.dkrz.de/icon/wiki/-/wikis/GPU-development (only accessible to ICON developers).
Python packages are not by default GPU-ready, a brief introduction how to use python on levante’s GPU nodes can be found here.
Compiling and Linking#
To build GPU-adapted programs on Levante, you can use the NVIDIA HPC SDK compilers for C/C++ and Fortran. These compilers support building of GPU ready codes using CUDA, OpenACC, OpenCL and OpenMP.
To make these available in your shell environment, you need to load an
nvhpc
environment module, e.g.:
$ module load nvhpc/22.5-gcc-11.2.0
The compiler names are nvc
, nvc++
, and nvfortran
for C,
C++ and Fortran programs respectively. nvcc
is used for CUDA C/C++
programs.
The table below lists a number of relevant compiler options, including recommendations for those parts of the code not usually running on the GPU like I/O routines.
Option |
Descripion |
---|---|
-mp |
Generates multi-threaded or accelerated code based on the OpenMP directives |
-acc |
Enables OpenACC pragmas and directives |
-target=gpu |
Select the NVIDIA GPU target for parallel programming paradigms. |
-Minfo=accel |
Display information about regions for GPU acceleration inferred by the compiler. |
-tp=zen3 |
Indicates the processor for which code is generated. |
-Kieee, -Knoieee |
Perform floating-point operations in strict conformance with
the IEEE 754 standard. Some optimizations are disabled with
-Kieee, and a more accurate math library is used. The
default, |
-Mfma |
Allow usage of fused multiply-add (FMA) instructions. The
alternative, |
Comprehensive documentation on using NVIDIA HPC compilers and program development tools can be found in the vendor documentation .
To build GPU-enabled and MPI-parallel programs you additionally need to load the following OpenMPI module:
$ module load openmpi/.4.1.4-nvhpc-22.5
You can check that OpenMPI has been built with GPU support (via CUDA)
by querying ompi_info
:
$ ompi_info -c |grep 'MPI extensions'
MPI extensions: affinity, cuda, pcollreq
cuda
must then be listed among the extensions.
The recommended method to build MPI programs is to use the wrappers,
i.e. to use mpicc
instead of nvc
, mpic++
instead of
nvc++
and mpifort
instead of nvfortran
.