Difference between revisions of "SLURM Commands"
Moskalenko (talk | contribs) |
|||
(23 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
− | [[Category: | + | [[Category:Scheduler]] |
− | { | + | {|align=right |
− | + | |__TOC__ | |
− | + | |} | |
+ | See also: [[Sample SLURM Scripts]] | ||
While there is a lot of documentation available on the [http://slurm.schedmd.com/slurm.html SLURM web page], we provide these commands to help users with examples and handy references. Have a favorite SLURM command? Users can edit the wiki pages, please add your examples. | While there is a lot of documentation available on the [http://slurm.schedmd.com/slurm.html SLURM web page], we provide these commands to help users with examples and handy references. Have a favorite SLURM command? Users can edit the wiki pages, please add your examples. | ||
+ | ==Check Job/Queue Status== | ||
+ | A simple tool is <code>scontrol show job '''''jobid'''''</code> | ||
− | + | Go to our [[SLURM Status Commands]] for more commands that give you helpful information about your ongoing jobs. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | ==Submit a Job== | |
− | + | Submit a job script to the SLURM scheduler with | |
+ | *<pre>sbatch script</pre> | ||
− | + | ==Interactive Session== | |
− | + | An interactive SLURM session i.e. a shell prompt within a running job can be started with | |
+ | *<pre>srun <resources> --pty bash -i</pre> | ||
+ | For example, a single node 2 CPU core job with 2gb of RAM for 90 minutes can be started with | ||
+ | *<pre>srun --ntasks=1 --cpus-per-task=2 --mem=2gb -t 90 --pty bash -i</pre> | ||
− | + | ==Canceling Jobs== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | = | ||
scancel jobID | scancel jobID | ||
or, for cancelling multiple jobs with names that follow a ''wildcard'' pattern | or, for cancelling multiple jobs with names that follow a ''wildcard'' pattern | ||
scancel pattern | scancel pattern | ||
− | |||
==Using sreport to view group summaries== | ==Using sreport to view group summaries== | ||
Line 48: | Line 31: | ||
To view a summary of group usage since a given date (May 1st in this example): | To view a summary of group usage since a given date (May 1st in this example): | ||
sreport cluster AccountUtilizationByUser Start=0501 Accounts=group_name | sreport cluster AccountUtilizationByUser Start=0501 Accounts=group_name | ||
+ | |||
+ | with a starting date and output formatting plus parsable output to paste into a spreadsheet: | ||
+ | sreport -p cluster AccountUtilizationByUser Format=Login%20,Proper%30,Used Start=MMDD Accounts=GROUP | ||
Or for a particular month (the month of May): | Or for a particular month (the month of May): | ||
sreport cluster AccountUtilizationByUser Start=0501 End=0531 Accounts=group_name | sreport cluster AccountUtilizationByUser Start=0501 End=0531 Accounts=group_name | ||
+ | |||
+ | Or for more information | ||
+ | sreport -t Hours cluster AccountUtilizationByUser Start=2022-01-01T00:00:00 End=2022-01-31T23:59:59 Accounts=group_name | ||
==Viewing Resources Available to a Group== | ==Viewing Resources Available to a Group== | ||
Line 58: | Line 47: | ||
or for the burst allocation: | or for the burst allocation: | ||
sacctmgr show qos group_name-b format="Name%-16,GrpSubmit,MaxWall,GrpTres%-45" | sacctmgr show qos group_name-b format="Name%-16,GrpSubmit,MaxWall,GrpTres%-45" | ||
+ | |||
+ | ==Using sinfo to view partition information and node features== | ||
+ | [https://slurm.schedmd.com/sinfo.html sinfo] is one command that users can use to learn about the resources managed by SLURM. sinfo provides information on the configuration of partitions and the details of nodes within each partition. Using sinfo, users can view the features attributed to the nodes, and then use those features as constraints when submitting jobs to, for example, request only nodes with Intel processors. | ||
+ | |||
+ | sinfo -s | ||
+ | Provides a summary of the partitions and the nodes within each, including the numbers of nodes that are Allocated, Idle, Offiline, and Total. | ||
+ | |||
+ | sinfo -o %P,%D,%c,%X,%m,%f | ||
+ | or | ||
+ | module load ufrc | ||
+ | nodeInfo | ||
+ | |||
+ | Shows the partitions, number of nodes, number of cores per node, number of sockets per node, amount of RAM per node, and the features associated with the nodes. These features can be used to request constraints in sbatch. For example: | ||
+ | #SBATCH --partition=hpg2-compute | ||
+ | #SBATCH --constraint='hgp2' | ||
+ | Would constrain a job to run on one of the 32-core AMD nodes from HiPerGator 2. | ||
+ | |||
+ | While constraints can be used to target particular resources, users should realize that using constraints also limits where a job can run and may delay scheduling a job. | ||
+ | |||
+ | == Get stored historic job script and environmental variables == | ||
+ | In SLURM v22 and up versions, job script and environmental variables are automatically stored and indexed in the databases and can be recalled conveniently: | ||
+ | |||
+ | sacct --batch -j <JOB_ID> | ||
+ | |||
+ | The above command outputs the job script used by the historic job <JOB_ID> to the standard output. | ||
+ | |||
+ | Recall environmental variables of a historic job <JOB_ID>: | ||
+ | sacct --env-vars -j <JOB_ID> | ||
+ | |||
+ | The above command outputs the environmental variables used by a historic job <JOB_ID> to the standard output. |
Latest revision as of 18:54, 13 November 2024
See also: Sample SLURM Scripts
While there is a lot of documentation available on the SLURM web page, we provide these commands to help users with examples and handy references. Have a favorite SLURM command? Users can edit the wiki pages, please add your examples.
Check Job/Queue Status
A simple tool is scontrol show job jobid
Go to our SLURM Status Commands for more commands that give you helpful information about your ongoing jobs.
Submit a Job
Submit a job script to the SLURM scheduler with
sbatch script
Interactive Session
An interactive SLURM session i.e. a shell prompt within a running job can be started with
srun <resources> --pty bash -i
For example, a single node 2 CPU core job with 2gb of RAM for 90 minutes can be started with
srun --ntasks=1 --cpus-per-task=2 --mem=2gb -t 90 --pty bash -i
Canceling Jobs
scancel jobID
or, for cancelling multiple jobs with names that follow a wildcard pattern
scancel pattern
Using sreport to view group summaries
The basic command is report. The full documentation for sreport is available on the SLURM web page, but we hope these examples are useful as they are and as templates for further customization.
To view a summary of group usage since a given date (May 1st in this example):
sreport cluster AccountUtilizationByUser Start=0501 Accounts=group_name
with a starting date and output formatting plus parsable output to paste into a spreadsheet: sreport -p cluster AccountUtilizationByUser Format=Login%20,Proper%30,Used Start=MMDD Accounts=GROUP
Or for a particular month (the month of May):
sreport cluster AccountUtilizationByUser Start=0501 End=0531 Accounts=group_name
Or for more information
sreport -t Hours cluster AccountUtilizationByUser Start=2022-01-01T00:00:00 End=2022-01-31T23:59:59 Accounts=group_name
Viewing Resources Available to a Group
To check the resources available to a group for running jobs, you can use the sacctmgr command (substitute the group_name with your group)
sacctmgr show qos group_name format="Name%-16,GrpSubmit,MaxWall,GrpTres%-45"
or for the burst allocation:
sacctmgr show qos group_name-b format="Name%-16,GrpSubmit,MaxWall,GrpTres%-45"
Using sinfo to view partition information and node features
sinfo is one command that users can use to learn about the resources managed by SLURM. sinfo provides information on the configuration of partitions and the details of nodes within each partition. Using sinfo, users can view the features attributed to the nodes, and then use those features as constraints when submitting jobs to, for example, request only nodes with Intel processors.
sinfo -s
Provides a summary of the partitions and the nodes within each, including the numbers of nodes that are Allocated, Idle, Offiline, and Total.
sinfo -o %P,%D,%c,%X,%m,%f
or
module load ufrc nodeInfo
Shows the partitions, number of nodes, number of cores per node, number of sockets per node, amount of RAM per node, and the features associated with the nodes. These features can be used to request constraints in sbatch. For example:
#SBATCH --partition=hpg2-compute #SBATCH --constraint='hgp2'
Would constrain a job to run on one of the 32-core AMD nodes from HiPerGator 2.
While constraints can be used to target particular resources, users should realize that using constraints also limits where a job can run and may delay scheduling a job.
Get stored historic job script and environmental variables
In SLURM v22 and up versions, job script and environmental variables are automatically stored and indexed in the databases and can be recalled conveniently:
sacct --batch -j <JOB_ID>
The above command outputs the job script used by the historic job <JOB_ID> to the standard output.
Recall environmental variables of a historic job <JOB_ID>:
sacct --env-vars -j <JOB_ID>
The above command outputs the environmental variables used by a historic job <JOB_ID> to the standard output.