Difference between revisions of "NCU and QOS limits under SLURM"

From UFRC
Jump to navigation Jump to search
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:SLURM]]
+
#REDIRECT [[Account_and_QOS_limits_under_SLURM]]
{{HPG2}}
 
=Account and QOS Use=
 
SLURM uses two arguments to determine what resource limits and priority apply to a job. The first argument is '--account'. It determines which investor group's allocation will be used for the job. If no account is specified it will be equal to your primary group. To use one of your secondary groups you have to specify it with the account argument. For example,
 
 
 
#SBATCH --account=mygroup
 
or
 
#SBATCH --account=mysecondarygroup
 
 
 
The second argument that determines job's priority and resource limits is '--qos'. There are two possible choices for QOS - the main investment QOS and the burst QOS formerly known as the 'soft' and 'hard' limit under Torque/MOAB. The QOS has to be explicitly specified as SLURM will not automatically move jobs between the two. The main qos has a 744-hour time limit and a high priority to make sure that a group can fully use its investment without having to wait for resources. To specify the main QOS use the group name in conjunction with the corresponding account. For example,
 
 
 
#SBATCH --account=mygroup
 
#SBATCH --qos=mygroup
 
 
 
The burst QOS has a 96-hour time limit and a lower priority as it depends on the availability of spare capacity on HiPerGator, but it provides the additional amount of 9x the resources of the main qos for a total of 10x of the investment between the two QOSes. To specify the burst QOS add '-b' to the group's name in the --qos argument. For example,
 
#SBATCH --account=mygroup
 
#SBATCH --qos=mygroup-b
 
 
 
There is another limit under SLURM, which applies to both QOS choices. A group can use all processor cores available to it under a particular qos only as long as it stays under the total memory limit for that QOS. The total memory limit is calculated as 'QOS NCU * 3gb'. For example, a main QOS of 30NCUs will have a group memory limit of 90gb while the burst qos for that group will be equal to '30 * 3gb * 9 = 810gb'. If the group memory limit is reached you will see a '(QOSGrpMemLimit)' status in the 'NODELIST(REASON)' column of the squeue output. For example,
 
 
 
squeue | grep MemLimit | head -n 1
 
            123456    bigmem test_job  jdoe PD      0:00      1 (QOSGrpMemLimit)
 
 
 
The above message can only be seen in the output of the 'squeue' command and does not interfere with job submission, but the job will stay queued until the group goes below its memory limit.
 
 
 
If the submitted job is so large that its resource request falls outside of the total resource limit within the requested QOS SLURM will refuse the job submission altogether and produce the following error
 
 
 
;sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
 
 
 
=Examples=
 
 
 
A hypothetical group ($GROUP in the examples below) has an investment of 42 NCUs. That's the group's so-called ''soft limit'' for  HiPerGator jobs in the main qos for up to 744 hours at high priority. The hard limit accessible through the so-called ''burst qos'' is +9 times that giving a group potentially a total of 10x the invested resources i.e. 420 NCUs with burst qos providing 378 NCUs of that capacity for up to 96 hours at low priority.
 
 
 
Let's test:
 
 
 
[marvin@gator ~]$ srun --mem=126gb --pty bash -i
 
 
 
srun: job 123456 queued and waiting for resources
 
 
 
<Looks good, let's terminate the request with Ctrl+C>
 
 
 
^C
 
 
 
srun: Job allocation 123456 has been revoked
 
 
 
srun: Force Terminated job 123456
 
 
 
 
 
On the other hand, going even 1gb over that limit results in the already encountered job limit error
 
 
 
[marvin@gator ~]$ srun --mem=127gb --pty bash -i
 
 
 
srun: error: Unable to allocate resources: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits
 
 
 
 
 
 
 
At this point the group can try using the 'burst' QOS with
 
 
 
#SBATCH --qos=$GROUP-b
 
 
 
 
 
Let's test:
 
 
 
[marvin@gator3 ~]$ srun -p bigmem --mem=400gb --time=96:00:00 --qos=$GROUP-b --pty bash -i
 
 
 
srun: job  123457 queued and waiting for resources
 
 
 
<Looks good, let's terminate with Ctrl+C>
 
 
 
^C
 
 
 
srun: Job allocation 123457 has been revoked
 
 
 
srun: Force Terminated job 123457
 
 
 
 
 
However, now there's the burst qos time limit to consider:
 
 
 
[marvin@gator ~]$ srun --mem=400gb --time=300:00:00 --pty bash -i
 
 
 
srun: error: Unable to allocate resources: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits
 
 
 
 
 
Let's reduce the time limit to what burst qos supports and try again:
 
 
 
 
 
 
 
[marvin@gator ~]$ srun --mem=400gb --time=96:00:00 --pty bash -i
 
 
 
srun: job  123458 queued and waiting for resources
 
 
 
<Looks good, let's terminate with Ctrl+C>
 
 
 
^C
 
 
 
srun: Job allocation 123458 has been revoked
 
 
 
srun: Force Terminated job
 
 
 
=Group Limit Errors=
 
 
 
The following ''Reasons'' can be seen in the job queue for a group when the group reaches the resource limit for the respective account/qos combination:
 
 
 
;QOSGrpCpuLimit
 
:means that all CPU cores available for the listed account within the respective QOS are in use.
 
 
 
;QOSGrpMemLimit
 
:means that all memory available for the listed account within the respective QOS as described in the previous section is in use.
 

Latest revision as of 22:17, 2 November 2016