Example Batch Scripts#

Two-way simultaneous multithreading (SMT) is enabled on all Levante nodes i.e. the operating system recognizes 256 logical CPUs per node, while there are only 128 physical cores. In most cases, it is advisable to not employ the simultaneous threads for the application, but to leave them for the operating system.

Below examples of batch scripts for the following use cases are provided:

MPI job without simultaneous multithreading#

The overall setting of the batch script does not vary whether one is using IntelMPI or OpenMPI (or any other MPI implementation). Specific environment variables should be set in order to fine-tune the used MPI. Especially, the parallel application should always be started using the srun command instead of invoking mpirun, mpiexec or others.

In the following examples 12*128 cores are used to execute a parallel program.

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --partition=compute
#SBATCH --nodes=12
#SBATCH --ntasks-per-node=128
#SBATCH --exclusive
#SBATCH --time=00:30:00
#SBATCH --mail-type=FAIL
#SBATCH --account=xz0123
#SBATCH --output=my_job.%j.out

# limit stacksize ... adjust to your programs need
# and core file size
ulimit -s 204800
ulimit -c 0

# Replace this block according to https://docs.dkrz.de/doc/levante/running-jobs/runtime-settings.html#mpi-runtime-settings
echo "Replace this block according to  https://docs.dkrz.de/doc/levante/running-jobs/runtime-settings.html#mpi-runtime-settings"
exit 23
# End of block to replace

# Use srun (not mpirun or mpiexec) command to launch
# programs compiled with any MPI library
srun -l --cpu_bind=verbose --hint=nomultithread \
  --distribution=block:cyclic ./myprog

Note: --hint=nomultithread cannot be used in conjunction with --ntasks-per-core, -threads-per-core and --cpu-bind. (--cpu-bind=verbose is allowed though.)

Please also read the section compiling and linking MPI programs on Levante.

Hybrid (MPI/OpenMP) job without simultaneous multithreading#

The following job example will allocate 4 compute nodes from the compute partition for 1 hour. The job will launch 32 MPI ranks per nodes, and 4 OpenMP threads per rank. On each node all 128 physical cores will be used.

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=compute    # Specify partition name
#SBATCH --nodes=4              # Specify number of nodes
#SBATCH --ntasks-per-node=32   # Specify number of (MPI) tasks on each node
#SBATCH --time=01:00:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output

# Bind your OpenMP threads
export OMP_NUM_THREADS=4
export KMP_AFFINITY="verbose,granularity=fine,scatter"
export KMP_LIBRARY="turnaround"


# limit stacksize ... adjust to your programs need
# and core file size
ulimit -s 204800
ulimit -c 0
export OMP_STACKSIZE=128M

# Replace this block according to https://docs.dkrz.de/doc/levante/running-jobs/runtime-settings.html#mpi-runtime-settings
echo "Replace this block according to https://docs.dkrz.de/doc/levante/running-jobs/runtime-settings.html#mpi-runtime-settings"
exit 23
# End of block to replace


# Use srun (not mpirun or mpiexec) command to launch
# programs compiled with any MPI library
srun -l --cpu_bind=verbose --hint=nomultithread \
  --distribution=block:cyclic:block ./myprog

Serial job#

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=shared     # Specify partition name
#SBATCH --mem=10G              # Specify amount of memory needed
#SBATCH --time=00:30:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output

set -e
ulimit -s 204800

module load python3

# Execute serial programs, e.g.
python -u /path/to/myscript.py

The shared partition has a limit of 960 MB memory per CPU. In case your serial job needs more memory you have to increase the amount of memory (--mem) accordingly. Slurm will automatically increase the number of CPUs allocated for the job.