Difference between revisions of "Slurm"

From UFRC
Jump to navigation Jump to search
Line 17: Line 17:
 
{{#if: {{#var: url}}|
 
{{#if: {{#var: url}}|
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 +
 +
HiPerGator and most other supercomputers are not used the same way as personal desktops/laptops/workstations. The massive amount of computing power requires a sophisticated approach to scheduling workloads to make sure that hardware resources are used efficiently, allocation limits are honored, and users and groups have a fair chance of using the resources without interfering with each other. Software called a resource manager and a scheduler are required to fulfill the above and other functions and conditions. on HiPerGator we use Slurm for managing hardware resources and scheduling workloads whether those submitted directly to the scheduler via job scripts or behind the scenes of more convenient interfaces like [[Open OnDemand]], [[Galaxy]], or [[Jupyterhub]].
  
 
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.
 
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.
Line 23: Line 25:
 
For a list of sample Slurm scripts, please [https://help.rc.ufl.edu/doc/Sample_SLURM_Scripts Sample SLURM scripts]
 
For a list of sample Slurm scripts, please [https://help.rc.ufl.edu/doc/Sample_SLURM_Scripts Sample SLURM scripts]
  
<!--Modules-->
 
==Environment Modules==
 
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
 
==System Variables==
 
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
 
 
<!--Configuration-->
 
<!--Configuration-->
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==

Revision as of 21:03, 27 May 2022

Description

slurm website  

HiPerGator and most other supercomputers are not used the same way as personal desktops/laptops/workstations. The massive amount of computing power requires a sophisticated approach to scheduling workloads to make sure that hardware resources are used efficiently, allocation limits are honored, and users and groups have a fair chance of using the resources without interfering with each other. Software called a resource manager and a scheduler are required to fulfill the above and other functions and conditions. on HiPerGator we use Slurm for managing hardware resources and scheduling workloads whether those submitted directly to the scheduler via job scripts or behind the scenes of more convenient interfaces like Open OnDemand, Galaxy, or Jupyterhub.

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.

Slurm Jobs

For a list of sample Slurm scripts, please Sample SLURM scripts