Difference between revisions of "Choosing QOS for a Job"
(Created page with "==Choosing QOS for a Job== When choosing between the high-priority investment QOS and the 9x larger low-priority burst QOS, you should start by considering the overall resourc...") |
|||
Line 1: | Line 1: | ||
− | + | Back to [[Account and QOS limits under SLURM]] | |
When choosing between the high-priority investment QOS and the 9x larger low-priority burst QOS, you should start by considering the overall resource requirements for the job. For smaller allocations the investment QOS may not be large enough for some jobs, whereas for other smaller jobs the wait time in the burst QOS could be too long. In addition, consider the current state of the account you are planning to use for your job. | When choosing between the high-priority investment QOS and the 9x larger low-priority burst QOS, you should start by considering the overall resource requirements for the job. For smaller allocations the investment QOS may not be large enough for some jobs, whereas for other smaller jobs the wait time in the burst QOS could be too long. In addition, consider the current state of the account you are planning to use for your job. | ||
Revision as of 01:23, 26 February 2023
Back to Account and QOS limits under SLURM When choosing between the high-priority investment QOS and the 9x larger low-priority burst QOS, you should start by considering the overall resource requirements for the job. For smaller allocations the investment QOS may not be large enough for some jobs, whereas for other smaller jobs the wait time in the burst QOS could be too long. In addition, consider the current state of the account you are planning to use for your job.
- Submit only non-time-critical jobs to the Burst QOS.
- Parallelize analyses to make sure they can run within the 4-day window.
- Let the scheduler take its time to find unused resources to run burst jobs.
To show the status of any SLURM account as well as the overall usage of HiPerGator resources, use the following command from the UFRC module:
$ module load ufrc $ slurmInfo
for the primary account or
$ slurmInfo <account>
for another account
Example: $ slurmInfo ufgi
:
---------------------------------------------------------------------- Allocation summary: Time Limit Hardware Resources Investment QOS Hours CPU MEM(GB) GPU ---------------------------------------------------------------------- ufgi 744 150 527 0 ---------------------------------------------------------------------- CPU/MEM Usage: Running Pending Total CPU MEM(GB) CPU MEM(GB) CPU MEM(GB) ---------------------------------------------------------------------- Investment (ufgi): 100 280 0 0 100 280 ---------------------------------------------------------------------- HiPerGator Utilization CPU: Used (%) / Total MEM(GB): Used (%) / Total ---------------------------------------------------------------------- Total : 43643 (92%) / 47300 113295500 (57%) / 196328830 ---------------------------------------------------------------------- * Burst QOS uses idle cores at low priority with a 4-day time limit Run 'slurmInfo -h' to see all available options
The output shows that the investment QOS for the ufgi
account is actively used. Since 100 CPU cores out of 150 available are used only 50 cores are available. In the same vein since 280GB out of 527GB in the investment QOS are used 247GB are still available. The ufgi-b
burst QOS is unused. Te total HiPerGator use is 92% of all CPU cores and 57% of all memory on compute nodes, which means that there is little available capacity from which burst resources can be drawn. In this case a job submitted to the ufgi-b
QOS would likely take a long time to start. If the overall utilization was below 80% it would be easier to start a burst job within a reasonable amount of time. When the HiPerGator load is high, or if the burst QOS is actively used, the investment QOS is more appropriate for a smaller job.