Accounting and Priorities#
Concept of job priority#
The individual job priority is computed as a weighted sum of three different factors - see below for details:
the time that the job is waiting in the queue
the share of the project’s compute time that has already been used
a special priority granted as “quality of service” to specific projects or kind of usage
Thus, a job will get an especially high priority if it
has already been in the queue for a long time (age_factor)
runs under an account that has not yet used its share of compute time (FairShare_factor)
is associated with a high priority for other reasons (e.g. a QOS could alter its priority)
SLURM job priority calculation#
On Levante we are using the Multi-factor Job Priority plugin of SLURM in order to influence job priority. The jobs priority at any given time is a weighted sum of the following factors:
age_factor ∈ [0,1] with 1 when age is more than PriorityMaxAge (30 day, 0 hours)
FairShare_factor ∈ [0,1] as explained below
QOS_factor ∈ [0,1] normalized according to ‘sacctmgr show qos’ (e.g. normal = 0, express = 0.1, bench = 1)
with the weights:
The final priority is then calculated as
Job_priority = PriorityWeightAge * age_factor + PriorityWeightFairshare * FairShare_factor + PriorityWeightQOS * QOS_factor
and can be checked with the sprio command:
PRIORITY = AGE + FAIRSHARE + QOS ∈ [0,3000] AGE = Weighted age priority ∈ [0,1000] FAIRSHARE = Weighted fair-share priority ∈ [0,1000] QOS = Weighted quality of service priority ∈ [0,1000]
While squeue has format options (%p and %Q) that display a job’s composite priority, sprio can be used to display a breakdown of the priority components for each job, e.g.
$ sprio JOBID PRIORITY AGE FAIRSHARE QOS 1421556 1175 100 975 100 2015831 274 20 204 50 2017372 258 0 258 0 ...
SLURM accounting storage#
For each SLURM job the accounting database stores the computing cycles delivered by a machine in the units of allocated_cpus * wall_clock_seconds.
Hence, one node with 256 logical CPUs used for one hour in the compute partition is accounted internally as
1 NodeHour = 3600 * 256 CPUsec = 172800 CPUsec
HLRE-projects are accounted by means of nodehours (as shown at https://luv.dkrz.de/projects/).