Difference between revisions of "Slurm"

From UFRC
Jump to navigation Jump to search
 
(6 intermediate revisions by 2 users not shown)
Line 17: Line 17:
 
{{#if: {{#var: url}}|
 
{{#if: {{#var: url}}|
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 +
See also [[HPG Scheduling]]
  
HiPerGator and most other supercomputers are not used the same way as personal desktops/laptops/workstations. The massive amount of computing power requires a sophisticated approach to scheduling workloads to make sure that hardware resources are used efficiently, allocation limits are honored, and users and groups have a fair chance of using the resources without interfering with each other. Software called a resource manager and a scheduler are required to fulfill the above and other functions and conditions. on HiPerGator we use Slurm for managing hardware resources and scheduling workloads whether those submitted directly to the scheduler via job scripts or behind the scenes of more convenient interfaces like [[Open OnDemand]], [[Galaxy]], or [[JupyterHub]].
+
HiPerGator and most other supercomputers are not used the same way as personal desktops/laptops/workstations. The massive amount of computing power requires a sophisticated approach to scheduling workloads to make sure that hardware resources are used efficiently, allocation limits are honored, and users and groups have a fair chance of using the resources without interfering with each other. Software called a resource manager and a scheduler are required to fulfill the above and other functions and conditions. On HiPerGator we use Slurm for managing hardware resources and scheduling workloads whether those submitted directly to the scheduler via job scripts or behind the scenes of more convenient interfaces like [[Open OnDemand]], [[Galaxy]], or [[Jupyter#JupyterHub| JupyterHub]].
  
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.
+
SLURM (Simple Linux Utility for Resource Management) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions.  
 +
*First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.  
 +
*Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.  
 +
*Finally, it arbitrates contention for resources by managing a queue of pending work.  
 +
Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.
  
==Slurm Examples==
+
==Using SLURM==
For a list of sample Slurm scripts, please [https://help.rc.ufl.edu/doc/Sample_SLURM_Scripts Sample SLURM scripts]
+
 
 +
===Submitting a SLURM Job===
 +
 
 +
* Start interactive sessions. See [[Development and Testing]] and [[Open OnDemand]].
 +
* Submit SLURM job scripts. For a list of sample Slurm scripts, please see [https://help.rc.ufl.edu/doc/Sample_SLURM_Scripts Sample SLURM scripts].
 +
 
 +
==Slurm Environment Variables==
 +
For a list of the most common environment variables, visit [https://help.rc.ufl.edu/doc/SLURM_Environmental_Variables SLURM Environmental Variables]
  
 
<!--Configuration-->
 
<!--Configuration-->
Line 36: Line 48:
 
|}}
 
|}}
 
<!--Job Scripts-->
 
<!--Job Scripts-->
==Passing variables into a job at submission==
+
===Passing variables into a job at submission===
 
It is possible to pass variables into a SLURM job when you submit the job using the --export flag.
 
It is possible to pass variables into a SLURM job when you submit the job using the --export flag.
For example to pass the value of the variables A and b into the job script named jobscript.sbatch you can use:
+
For example to pass the value of the variables A and b into the job script named jobscript.sbatch you can use either of the following:
sbatch --export=A=5,b='test' jobscript.sbatch
+
*<pre>sbatch --export=A=5,b='test' jobscript.sbatch</pre>
 
+
*<pre>sbatch --export=ALL,A=4,b='test' jobscript.sbatch</pre>
or
 
sbatch --export=ALL,A=4,b='test' jobscript.sbatch
 
  
 
The first example will replace the user's environment with a new environment containing only values for A and b and the SLURM_* environment variables.  The second will add the values for A and b to the existing environment.
 
The first example will replace the user's environment with a new environment containing only values for A and b and the SLURM_* environment variables.  The second will add the values for A and b to the existing environment.
  
==Using variables to set SLURM job name and output files==
+
===Using variables to set SLURM job name and output files===
  
 
SLURM does not support using variables in the #SBATCH lines within a job script. However, values passed from the command line have precedence over values defined in the job script. So the job name and output/error files can be passed on the sbatch command line:
 
SLURM does not support using variables in the #SBATCH lines within a job script. However, values passed from the command line have precedence over values defined in the job script. So the job name and output/error files can be passed on the sbatch command line:
Line 52: Line 62:
 
  b='test'
 
  b='test'
 
  sbatch --job-name=$A.$b.run --output=$A.$b.out --export=A=$A,b=$b jobscript.sbatch
 
  sbatch --job-name=$A.$b.run --output=$A.$b.out --export=A=$A,b=$b jobscript.sbatch
 +
 +
==SLURM commands==
 +
See [[SLURM Commands]] to learn more about the commands available to control and monitor your jobs.
  
 
==Bypassing X11 Requirement==
 
==Bypassing X11 Requirement==

Latest revision as of 15:27, 10 July 2024

Description

slurm website  
See also HPG Scheduling

HiPerGator and most other supercomputers are not used the same way as personal desktops/laptops/workstations. The massive amount of computing power requires a sophisticated approach to scheduling workloads to make sure that hardware resources are used efficiently, allocation limits are honored, and users and groups have a fair chance of using the resources without interfering with each other. Software called a resource manager and a scheduler are required to fulfill the above and other functions and conditions. On HiPerGator we use Slurm for managing hardware resources and scheduling workloads whether those submitted directly to the scheduler via job scripts or behind the scenes of more convenient interfaces like Open OnDemand, Galaxy, or JupyterHub.

SLURM (Simple Linux Utility for Resource Management) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions.

  • First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.
  • Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.
  • Finally, it arbitrates contention for resources by managing a queue of pending work.

Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.

Using SLURM

Submitting a SLURM Job

Slurm Environment Variables

For a list of the most common environment variables, visit SLURM Environmental Variables


Passing variables into a job at submission

It is possible to pass variables into a SLURM job when you submit the job using the --export flag. For example to pass the value of the variables A and b into the job script named jobscript.sbatch you can use either of the following:

  • sbatch --export=A=5,b='test' jobscript.sbatch
  • sbatch --export=ALL,A=4,b='test' jobscript.sbatch

The first example will replace the user's environment with a new environment containing only values for A and b and the SLURM_* environment variables. The second will add the values for A and b to the existing environment.

Using variables to set SLURM job name and output files

SLURM does not support using variables in the #SBATCH lines within a job script. However, values passed from the command line have precedence over values defined in the job script. So the job name and output/error files can be passed on the sbatch command line:

A=5
b='test'
sbatch --job-name=$A.$b.run --output=$A.$b.out --export=A=$A,b=$b jobscript.sbatch

SLURM commands

See SLURM Commands to learn more about the commands available to control and monitor your jobs.

Bypassing X11 Requirement

It may be necessary to set up a virtual X11 environment with Xvfb if a program you need to run expects an X11 environment. Use the following code between the module load and the command to run in the job script. Adapt as needed.

export DISPLAY=${RANDOM}
Xvfb :${DISPLAY} -screen 0 1280x960x24 &

Avoiding Session Interruption

It is possible to have multiple sessions running simultaneously. HiPerGator offers options for terminal multiplexing. See Persistent Terminal Sessions for more details.