Runtime Settings

Warning

Levante is not yet fully available. Please check regularly which limitations to expect.

When running programs on Levante, various settings might be needed to achieve satisfying performance, or, in some cases, even to allow a program to run. This section describes some environment settings which are most often needed to be set in Slurm batch scripts.

MPI Runtime Settings

Modern MPI library implementations provide a large number of user-configurable parameters and algorithms for performance tuning. Although the local configuration of MPI libraries is initially performed by vendor to match the characteristics of the cluster, the performance of a specific application can often be further improved by up to 15% by optimal choice of tunable parametes.

Since tuning options are specific to an MPI library and application, the recommendation for MPI runtime setting below are just a starting point for each version.

Open MPI 4.0.0 and later

As a minimal environmental setting we recommend the following to make use of the UCX toolkit. This is just a starting point, users will have to tune the environment depending on the used application.

export OMPI_MCA_pml="ucx"
export OMPI_MCA_btl=self
export OMPI_MCA_osc="pt2pt"
export UCX_IB_ADDR_TYPE=ib_global
# for most runs one may or may not want to disable HCOLL
export OMPI_MCA_coll="^ml,hcoll"
export OMPI_MCA_coll_hcoll_enable="0"
export HCOLL_ENABLE_MCAST_ALL="0"
export HCOLL_MAIN_IB=mlx5_0:1
export UCX_NET_DEVICES=mlx5_0:1
export UCX_TLS=mm,knem,cma,dc_mlx5,dc_x,self
export UCX_UNIFIED_MODE=y
export HDF5_USE_FILE_LOCKING=FALSE
export OMPI_MCA_io="romio321"
export UCX_HANDLE_ERRORS=bt

The ompi_info tool can be used to get detailed information about OpenMPI installation and local configuration:

ompi_info --all

Intel MPI

Environment variables for Intel MPI start with an I_MPI_ prefix. The complete reference of environment variables can be found at Intel’s site. On Levante, to run programs built with Intel MPI, you should set at least the following environment variables:

export I_MPI_PMI=pmi
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

For large jobs we recommend to use PMI-2 instead of PMI. The corresponding settings are:

export I_MPI_PMI=pmi2
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi2.so

srun --mpi=pmi2 ...

Ressource Limits

Stack Size

Using an unlimited size stack might have a negative influence on performance. Also, an unlimited stack hides invalid memory accesses. Therefore it’s recommended to define the actually needed amount. For example, to set the limit for stack size to 200MB (200*1024) use one of the following statements:

ulimit -s 204800         # bash
limit stacksize 204800   # tcsh

It might be necessary to further increase the stack size if your program uses large automatic arrays. If the stack size is too small the program usually will crash with an error message like this:

"Caught signal 11 (Segmentation fault: address not mapped to object at
 address 0x0123456789abcdef)".

Obviously, the actual address will vary. If increasing the stack size does not resolve the program abort, a Segmentation fault error is a strong indication for a bug in your program.

Core File Size

It is also recommended to disable core file generation unless needed for debugging purposes:

ulimit -c 0    # bash
limit core 0   # tcsh

All current limits can be listed with the following command:

ulimit -a    # bash
limit        # tcsh