Submitting Array Jobs

From UFRC
Revision as of 15:36, 5 May 2023 by Israel.herrera (talk | contribs) (Created page with "Back to SLURM Job Arrays ==Submitting array jobs== A job array can be submitted simply by adding #SBATCH --array=x-y to the job script where ''x'' and ''y'' are the arra...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Back to SLURM Job Arrays

Submitting array jobs

A job array can be submitted simply by adding

#SBATCH --array=x-y

to the job script where x and y are the array bounds. A job array can also be specified at the command line with

sbatch --array=x-y job_script.sbatch

A job array will then be created with a number of independent jobs a.k.a. array tasks that correspond to the defined array.

SLURM's job array handling is very versatile. Instead of providing a task range a comma-separated list of task numbers can be provided, for example, to rerun a few failed jobs from a previously completed job array as in

sbatch --array=4,8,15,16,23,42  job_script.sbatch

which can be used to quickly rerun the lost tasks from a previous job array for example. Command line options override options in the script, so those can be left unchanged.

Limiting the number of tasks that run at once

To throttle a job array by keeping only a certain number of tasks active at a time use the %N suffix where N is the number of active tasks. For example

#SBATCH -a 1-200%5

will produce a 200 task job array with only 5 tasks active at any given time.

Note that while the symbol used is the % sign, this is the actual number of tasks to be submitted at once.

Using scontrol to modify throttling of running array jobs

Reducing the "ArrayTaskThrottle" count on a running job array will not affect the tasks that have already entered the "RUNNING" state. It will only prevent new tasks from starting until the number or running tasks drops below the new lower threshold.

If you want to change the number of simultaneous tasks of an active job, you can use scontrol:

scontrol update ArrayTaskThrottle=<count> JobId=<jobID>

eg

scontrol update ArrayTaskThrottle=50 JobId=12345

Set ArrayTaskThrottle=0 to eliminate any limit.

Naming output and error files

SLURM uses the %A and %a replacement strings for the master job ID and task ID, respectively.

For example:

#SBATCH --output=Array_test.%A_%a.out
#SBATCH --error=Array_test.%A_%a.error

The error log is optional as both types of logs can be written to the 'output' log.

#SBATCH --output=Array_test.%A_%a.log
Note
if you only use '%A' in the log all array tasks will try to write to a single file. The performance of the run will approach zero asymptotically. Make sure to use both %A and %a in the log file name specification.