Difference between revisions of "SLURM Commands"

From UFRC
Jump to navigation Jump to search
Line 53: Line 53:
 
or for the burst allocation:
 
or for the burst allocation:
 
  sacctmgr show qos group_name-b format="Name%-16,GrpSubmit,MaxWall,GrpTres%-45"
 
  sacctmgr show qos group_name-b format="Name%-16,GrpSubmit,MaxWall,GrpTres%-45"
 +
==Requesting Resources==
 +
===Requesting an entire node===
 +
It is possible to request an entire node for your work. However, please note that doing so will count against the resources available for your group. Since the nodes in HiPerGator 2.0 have 32 cores, this means 32 cores will be utilized from your group resources while you are taking an entire node.
 +
 +
To effectively reserve an entire node, you can put the following in your script:
 +
<pre>
 +
#SBATCH --nodes=1
 +
#SBATCH --ntasks=32
 +
</pre>
 +
SLURM will not oversubscribe the hose, so if you ask for all 32 cores, you will get exclusive access to the node.

Revision as of 21:35, 21 July 2016

Hpg2 wiki logo.png

HiPerGator 2.0 documentation

SLURM Commands

While there is a lot of documentation available on the SLURM web page, we provide these commands to help users with examples and handy references. Have a favorite SLURM command? Users can edit the wiki pages, please add your examples.

Checking on the queue

The basic command is squeue. The full documentation for squeue is available on the SLURM web page, but we hope these examples are useful as they are and as templates for further customization.

For a list of jobs running under a particular group, use the -A flag (for Account) with the group name.

squeue -A group_name

For a summary that is similar to the MOAB/Torque showq command (again, -u user or -A group can be added):

squeue -o "%.10A %.18u %.4t %.8C %.20L %.30S"

To include qos and limit to a group:

squeue -A group_name -O jobarrayid,qos,name,username,timelimit,numcpus,reasonlist


Checking job information

The basic command is sacct. The full documentation for sacct is available on the SLURM web page, but we hope these examples are useful as they are and as templates for further customization.

By default, sacct will only show your in the queue or running since midnight of the current day. To view jobs from a particular date, you can specify a start time (-S or --starttime) with one of a number of formats, for example since May 1st (0501):

sacct -S 0501

The default columns displayed are:

JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 

To other information can either be pulled from the -l view which has a long list of columns, or by specifying the information you want to view. For example to see the number of CPUs, total memory use and walltime of all jobs since May 1st (0501), you could use:

sacct -S 0501 -o JobIDRaw,JobName,NCPUS,MaxRSS,Elapsed

To do the same for a whole group:

sacct -S 0501 -o JobIDRaw,JobName,User,NCPUS,MaxRSS,Elapsed -a -A group_name

To view memory use of jobs:

sacct --format=User,JobID,ReqMem,MaxRss

Using sreport to view group summaries

The basic command is report. The full documentation for sreport is available on the SLURM web page, but we hope these examples are useful as they are and as templates for further customization.

To view a summary of group usage since a given date (May 1st in this example):

sreport cluster AccountUtilizationByUser Start=0501 Accounts=group_name

Or for a particular month (the month of May):

sreport cluster AccountUtilizationByUser Start=0501 End=0531 Accounts=group_name

Viewing Resources Available to a Group

To check the resources available to a group for running jobs, you can use the sacctmgr command (substitute the group_name with your group)

sacctmgr show qos group_name format="Name%-16,GrpSubmit,MaxWall,GrpTres%-45"

or for the burst allocation:

sacctmgr show qos group_name-b format="Name%-16,GrpSubmit,MaxWall,GrpTres%-45"

Requesting Resources

Requesting an entire node

It is possible to request an entire node for your work. However, please note that doing so will count against the resources available for your group. Since the nodes in HiPerGator 2.0 have 32 cores, this means 32 cores will be utilized from your group resources while you are taking an entire node.

To effectively reserve an entire node, you can put the following in your script:

#SBATCH --nodes=1
#SBATCH --ntasks=32

SLURM will not oversubscribe the hose, so if you ask for all 32 cores, you will get exclusive access to the node.