Difference between revisions of "SLURM Partition Limits"

From UFRC
Jump to navigation Jump to search
Line 17: Line 17:
 
* Default: 10 min
 
* Default: 10 min
 
* Maximum: 14 days*
 
* Maximum: 14 days*
   <nowiki>*</nowiki>Note: this change may be temproary. Our analysis of the GPU usage patterns allowed us to increase the maximum walltime from the previous value of 7 days to the current value of 14 days, but if the resource utilization will approach levels that would prevent our ability to satisfy the investment QOS SLA the time limit will be adjusted downwards.
+
   <nowiki>*</nowiki> Note: this change may be temporary. Our analysis of the GPU usage patterns allowed us to increase the maximum job time limit from the previous value of 7 days to the current value of 14 days, but if the resource utilization will approach levels that would prevent our ability to satisfy the investment QOS SLA the time limit will be adjusted downwards to prevent service level degradation.
 +
 
 
==Compute Partitions==
 
==Compute Partitions==
 
;Partitions: hpg-default, hpg2-compute, bigmem
 
;Partitions: hpg-default, hpg2-compute, bigmem

Revision as of 18:57, 20 April 2023

Different sets of hardware resources presented as SLURM partitions have individual time limits.

Interactive Work

Partitions: hpg-dev, gpu, hpg-ai

  • Default time limit if not specified (Default): 10 min
  • hpg-dev Maximum: 12 hours
  • gpu
    • Maximum: 12 hours for srun .... --pty bash -i sessions
    • Maximum: 72 hours for Jupyter sessions in Open OnDemand.
  • hpg-ai
    • Maximum: 12 hours for srun .... --pty bash -i sessions

Jupyter

  • JupyterHub: Sessions are preset with individual limits shown in the menu
  • JupyterLab in Open OnDemand Maximum: 72 hours for the GPU partition, other partitions follow standard partition limits

GPU/HPG-AI Partitions

  • Default: 10 min
  • Maximum: 14 days*
 * Note: this change may be temporary. Our analysis of the GPU usage patterns allowed us to increase the maximum job time limit from the previous value of 7 days to the current value of 14 days, but if the resource utilization will approach levels that would prevent our ability to satisfy the investment QOS SLA the time limit will be adjusted downwards to prevent service level degradation.

Compute Partitions

Partitions
hpg-default, hpg2-compute, bigmem

Both the hpg-default and the hpg2-compute partitions are selected by default if no partition is specified for a job.

Investment QOS

  • Default: 10 min
  • Maximum: 31 days (744 hours)

Burst QOS

  • Default: 10 min
  • Maximum: 4 days (96 hours)

Hardware Accelerated GUI

Partition
hwgui
  • Default: 10 min
  • Maximum: 4 days (96 hours)