SLURM Status Commands

From UFRC
Jump to navigation Jump to search

Back to SLURM Commands

Checking on the queue

The basic command to monitor your workload is squeue. The full documentation for squeue is available on the SLURM web page, but we hope these examples are useful as they are and as templates for further customization. In HPG, you can use the command "squeuemine" to see your current squeue.

For a list of jobs running under a particular group, use the -A flag (for Account) with the group name.

  • squeue -A group_name

For a summary that is similar to the MOAB/Torque showq command (again, -u user or -A group can be added):

  • squeue -o "%.10A %.18u %.4t %.8C %.20L %.30S"

To include qos and limit to a group:

  • squeue -O jobarrayid,qos,name,username,timelimit,numcpus,reasonlist -A group_name

Checking job information

The basic command is sacct. The full documentation for sacct is available on the SLURM web page, but we hope these examples are useful as they are and as templates for further customization.

By default, sacct will only show your in the queue or running since midnight of the current day. To view jobs from a particular date, you can specify a start time (-S or --starttime) with one of a number of formats, for example since May 1st (0501):

sacct -S 0501

The default columns displayed are:

JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 

To other information can either be pulled from the -l view which has a long list of columns, or by specifying the information you want to view. For example to see the number of CPUs, total memory use and walltime of all jobs since May 1st (0501), you could use:

sacct -S 0501 -o JobIDRaw,JobName,NCPUS,MaxRSS,Elapsed

To do the same for a whole group:

sacct -S 0501 -o JobIDRaw,JobName,User,NCPUS,MaxRSS,Elapsed -a -A group_name

To view memory use of jobs:

sacct --format=User,JobID,ReqMem,MaxRss

The above operations get information about completed jobs from the SLURM database. To look at the currently running jobs use the sstat command. For example,

 sstat -j 123456.batch -o maxrss
    MaxRSS
----------
 16111996K

See man sstat manual page on the cluster for more details or go to https://slurm.schedmd.com/sstat.html.