Partitions and Limits#
Partitions#
In SLURM multiple nodes can be grouped into partitions which are sets of nodes with associated limits for wall-clock time, job size, etc. These limits are hard limits for the jobs and can just be overruled by QOS (quality of service). The defined partitions can overlap, i.e. one node might be contained in several partitions.
Jobs are allocations of resources by users in order to execute tasks on the cluster for a specified period of time. Furthermore, the concept of job steps is used by SLURM to describe a set of different tasks within the job. One can imagine job steps as smaller allocations or jobs within the job, which can be executed sequentially or in parallel during the main job allocation.
The SLURM sinfo
command lists all partitions and nodes managed by SLURM
on Levante as well as provides general information about the current nodes’ status
(Allocated/Idle/Other/Total):
$ sinfo -o "%.11P %.5a %.10l %.14F %.20f %N" -p compute,gpu,shared,interactive
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) AVAIL_FEATURES NODELIST
gpu up 12:00:00 31/24/1/56 512G,cell13,a100_80 l[50000,50003,50006,50009,50012,50015,50018,50021,50024,50027,50030,50033,50036,50039,50042,50045,50048,50051,50054,50057,50060,50063,50066,50069,50072,50075,50078,50081,50100,50103,50106,50109,50112,50115,50118,50121,50124,50127,50130,50133,50136,50139,50142,50145,50148,50151,50154,50157,50160,50163,50166,50169,50172,50175,50178,50181]
gpu up 12:00:00 0/2/0/2 512G,cell09,a100_40 l[40360,40363]
gpu up 12:00:00 0/2/0/2 1024G,cell09,a100_40 l[40366,40369]
compute up 8:00:00 182/94/9/285 256G,cell01 l[10000-10058,10060-10095,10100-10158,10160-10195,10200-10258,10260-10295]
compute up 8:00:00 245/37/6/288 256G,cell02 l[10300-10395,10400-10495,10500-10595]
compute up 8:00:00 199/88/1/288 256G,cell03 l[10600-10695,10700-10795,20000-20095]
compute up 8:00:00 236/49/3/288 256G,cell04 l[20100-20195,20200-20295,20300-20395]
compute up 8:00:00 194/94/0/288 256G,cell05 l[20400-20495,20500-20595,20600-20695]
compute up 8:00:00 48/0/0/48 512G,cell08 l[40027-40047,40063-40083,40090-40095]
compute up 8:00:00 220/7/4/231 512G,cell09 l[40100-40183,40190-40195,40200-40283,40287-40295,40300-40347]
compute up 8:00:00 12/0/0/12 1024G,cell09 l[40348-40359]
compute up 8:00:00 6/0/0/6 1024G,cell11 l[40660-40665]
compute up 8:00:00 31/251/6/288 256G,cell06 l[30000-30095,30100-30195,30200-30295]
compute up 8:00:00 0/287/1/288 256G,cell07 l[30300-30395,30400-30495,30500-30595]
compute up 8:00:00 12/187/5/204 256G,cell08 l[30600-30695,30700-30795,40015-40026]
compute up 8:00:00 0/270/9/279 256G,cell11 l[40400-40495,40500-40595,40600-40659,40666-40683,40687-40695]
compute up 8:00:00 25/134/0/159 256G,cell10 l[50200-50295,50300-50359,50369-50371]
shared up 7-00:00:00 8/7/0/15 256G,cell08 l[40000-40014]
interactive up 12:00:00 2/13/0/15 512G,cell08 l[40048-40062]
For detailed information about all available partitions and their limits
use the SLURM scontrol
command as follows:
$ scontrol show partition
The following publicly available partitions are currently defined on Levante:
compute#
This partition consists of 2952 AMD EPYC 7763 Milan compute nodes and is intended for running parallel scientific applications. The compute nodes allocated for a job are used exclusively and cannot be shared with other jobs.
The partition contains nodes with different memory
configurations. If you want to use the
entire memory of a larger node, you have to request all memory with
the --mem=0
option.
interactive#
The interactive partition is made up of 15 nodes but can be
dynamically expanded if there is a short-term need. It is intended for
memory or compute intensive data processing and compilation
tasks that should not run on the login nodes. Nodes of this partition
can be shared with other jobs if a single job does not allocate all
resources. Use salloc
to allocate the resources and directly jump
to that node. Basically, this partition should not have any waiting
times. The total amount of ressources per user in this partition is
limited to an equivalent of one node.
gpu#
The 60 nodes in this partition are each equipped with 2 AMD EPYC Milan 7713 CPUs and additional 4 Nvidia A100 GPUs. These can be used for GPGPU-aware scientific applications (e.g. via OpenACC programming) or interactive 3-dimensional data visualization via VirtualGL/TurboVNC. More details on how to use the GPU nodes are given under Using GPU nodes.
Limits#
The SLURM limits configured for different partitions are:
Partition Name |
Max Nodes per Job |
Max Job Runtime |
Max resources* |
Shared Node Usage |
Default Memory per CPU |
Max Memory per CPU |
---|---|---|---|---|---|---|
compute |
512 |
8 hours |
no limit |
no |
940 MB |
3940 MB |
shared |
1 |
7 days |
512 CPUs |
yes |
940 MB |
940 MB |
interactive |
1 |
12 hours |
256 CPUs |
yes |
940 MB |
1940 MB |
gpu |
60 |
12 hours |
no limit |
yes |
940 MB |
3940 MB |
gpu-devel |
1 |
30 minutes |
no limit |
yes |
940 MB |
1940 MB |
*used simultaneously by all running jobs of a user
Hint
If your jobs require either longer execution times or more nodes, contact DKRZ Help Desk. The predefined limits can be adjusted for a limited time to match your purposes by specifying an appropriate Quality of Service (QOS). Please, include the following information in your request: username(s), project id, the reason why you need higher limits, what limits to increase, and for how long those should be increased. Also a brief justification by your project admin is needed.
CAUTION: All jobs on levante have to be assigned to a partition - there is no default partition available. Choosing the partition can be done in various ways
Environment variable
export SBATCH_PARTITION=<partitionname>
Batch script option
#SBATCH [-p|--partition=]<partitionname>
Command line option
sbatch [-p|--partition=]<partitionname>
Note that an environment variable will override any matching option set in a batch script, and command line option will override any matching environment variable.
To control the job workload on the levante cluster and keep SLURM responsive, we enforce the following restrictions regarding the number of jobs:
SLURM Limits |
Max Number of Submitted Jobs |
Max Number of Running Jobs |
|
---|---|---|---|
GPU Partition |
Other Partitions |
||
Per User and Account |
1000 |
5 |
20 |
If needed, you can ask for higher limits by sending a request with a short justification to support@dkrz.de. Based on the technical limitations and a fair share between all users, we might then arrange a QOS for some limited time.
To list job limits and quality of services relevant to you, use the sacctmgr
command,
for example:
sacctmgr -s show user $USER
sacctmgr -s show user $USER format=user,account,maxjobs,maxsubmit,qos