Difference between revisions of "SLURM Job Arrays"
(26 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | [[Category: | + | [[Category:Scheduler]] |
− | ==Introduction== | + | {|align=right |
− | To submit a number of identical jobs without having drive the submission with an external script use the SLURM's feature of ''array jobs''. | + | |__TOC__ |
+ | |} | ||
+ | ==Introduction and Submitting Arrays== | ||
+ | To submit a number of identical jobs without having drive the submission with an external script use the SLURM's feature of ''array jobs''. You can learn how to submit them at [[Submitting Array Jobs]]. | ||
− | + | '''Note:''' There is a maximum limit of 3000 jobs per user on HiPerGator. | |
− | |||
− | |||
− | |||
− | |||
− | + | ==Using the array ID Index== | |
+ | SLURM will provide a ''$SLURM_ARRAY_TASK_ID'' variable to each task. It can be used inside the job script to handle input and output files for that task. To learn how and see some examples, visit [[Array ID Indexes]]. | ||
− | SLURM | + | ==Running many short tasks== |
− | + | While SLURM array jobs make it easy to run many similar tasks, if each task is short (seconds or even a few minutes), array jobs quickly bog down the scheduler and more time is spent managing jobs than actually doing any work for you. This also negatively impacts other users. | |
− | + | If you have hundreds or thousands of tasks, it is unlikely that a simple array job is the best solution. That does not mean that array jobs are not helpful in these cases, but that a little more thought needs to go into them for efficient use of the resources. | |
− | = | + | [[File:Play_icon.png|frameless|30px|link=https://mediasite.video.ufl.edu/Mediasite/Play/5bbd7cfb22b2416bbb0541e79875def51d]] [10 min, 16sec] Watch the video discussing some of the issues and walking through the details of the example script below. |
− | |||
− | |||
− | |||
− | + | As an example let's imagine I have 5,000 runs of a program to do, with each run taking about 30 seconds to complete. Rather than running an array job with 5,000 tasks, it would be much more efficient to run 5 tasks where each completes 1,000 runs. | |
+ | <div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;"> | ||
+ | ''Expand to view a sample script to accomplish this by combining array jobs with bash loops.'' | ||
+ | <div class="mw-collapsible-content" style="padding: 5px;"> | ||
+ | <pre> | ||
+ | #!/bin/sh | ||
+ | #SBATCH --job-name=mega_array # Job name | ||
+ | #SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL) | ||
+ | #SBATCH --mail-user=gatorlink@ufl.edu # Where to send mail | ||
+ | #SBATCH --nodes=1 # Use one node | ||
+ | #SBATCH --ntasks=1 # Run a single task | ||
+ | #SBATCH --mem-per-cpu=1gb # Memory per processor | ||
+ | #SBATCH --time=00:10:00 # Time limit hrs:min:sec | ||
+ | #SBATCH --output=array_%A-%a.out # Standard output and error log | ||
+ | #SBATCH --array=1-5 # Array range | ||
+ | # This is an example script that combines array tasks with | ||
+ | # bash loops to process many short runs. Array jobs are convenient | ||
+ | # for running lots of tasks, but if each task is short, they | ||
+ | # quickly become inefficient, taking more time to schedule than | ||
+ | # they spend doing any work and bogging down the scheduler for | ||
+ | # all users. | ||
+ | pwd; hostname; date | ||
− | = | + | #Set the number of runs that each SLURM task should do |
+ | PER_TASK=1000 | ||
− | + | # Calculate the starting and ending values for this task based | |
+ | # on the SLURM task and the number of runs per task. | ||
+ | START_NUM=$(( ($SLURM_ARRAY_TASK_ID - 1) * $PER_TASK + 1 )) | ||
+ | END_NUM=$(( $SLURM_ARRAY_TASK_ID * $PER_TASK )) | ||
− | + | # Print the task and run range | |
− | + | echo This is task $SLURM_ARRAY_TASK_ID, which will do runs $START_NUM to $END_NUM | |
− | |||
− | + | # Run the loop of runs for this task. | |
− | SLURM | + | for (( run=$START_NUM; run<=END_NUM; run++ )); do |
+ | echo This is SLURM task $SLURM_ARRAY_TASK_ID, run number $run | ||
+ | #Do your stuff here | ||
+ | done | ||
− | + | date | |
− | + | </pre> | |
− | + | </div> | |
− | + | </div> | |
− | |||
− | |||
==Deleting job arrays and tasks== | ==Deleting job arrays and tasks== | ||
Line 47: | Line 69: | ||
To delete a single task, add the task ID: | To delete a single task, add the task ID: | ||
scancel 292441_5 | scancel 292441_5 | ||
+ | |||
+ | ==Controlling Job emails== | ||
+ | By default in SLURM, the emails for events BEGIN, END and FAIL apply to the job array as a whole rather than individual tasks. So: | ||
+ | #SBATCH --mail-type=BEGIN,END,FAIL | ||
+ | would result in one email per job, not per task. If you want per task emails, specify: | ||
+ | #SBATCH --mail-type=BEGIN,END,FAIL,ARRAY_TASKS | ||
+ | which will send emails for each task in the array. |
Latest revision as of 18:50, 16 May 2023
Introduction and Submitting Arrays
To submit a number of identical jobs without having drive the submission with an external script use the SLURM's feature of array jobs. You can learn how to submit them at Submitting Array Jobs.
Note: There is a maximum limit of 3000 jobs per user on HiPerGator.
Using the array ID Index
SLURM will provide a $SLURM_ARRAY_TASK_ID variable to each task. It can be used inside the job script to handle input and output files for that task. To learn how and see some examples, visit Array ID Indexes.
Running many short tasks
While SLURM array jobs make it easy to run many similar tasks, if each task is short (seconds or even a few minutes), array jobs quickly bog down the scheduler and more time is spent managing jobs than actually doing any work for you. This also negatively impacts other users.
If you have hundreds or thousands of tasks, it is unlikely that a simple array job is the best solution. That does not mean that array jobs are not helpful in these cases, but that a little more thought needs to go into them for efficient use of the resources.
[10 min, 16sec] Watch the video discussing some of the issues and walking through the details of the example script below.
As an example let's imagine I have 5,000 runs of a program to do, with each run taking about 30 seconds to complete. Rather than running an array job with 5,000 tasks, it would be much more efficient to run 5 tasks where each completes 1,000 runs.
Expand to view a sample script to accomplish this by combining array jobs with bash loops.
#!/bin/sh #SBATCH --job-name=mega_array # Job name #SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=gatorlink@ufl.edu # Where to send mail #SBATCH --nodes=1 # Use one node #SBATCH --ntasks=1 # Run a single task #SBATCH --mem-per-cpu=1gb # Memory per processor #SBATCH --time=00:10:00 # Time limit hrs:min:sec #SBATCH --output=array_%A-%a.out # Standard output and error log #SBATCH --array=1-5 # Array range # This is an example script that combines array tasks with # bash loops to process many short runs. Array jobs are convenient # for running lots of tasks, but if each task is short, they # quickly become inefficient, taking more time to schedule than # they spend doing any work and bogging down the scheduler for # all users. pwd; hostname; date #Set the number of runs that each SLURM task should do PER_TASK=1000 # Calculate the starting and ending values for this task based # on the SLURM task and the number of runs per task. START_NUM=$(( ($SLURM_ARRAY_TASK_ID - 1) * $PER_TASK + 1 )) END_NUM=$(( $SLURM_ARRAY_TASK_ID * $PER_TASK )) # Print the task and run range echo This is task $SLURM_ARRAY_TASK_ID, which will do runs $START_NUM to $END_NUM # Run the loop of runs for this task. for (( run=$START_NUM; run<=END_NUM; run++ )); do echo This is SLURM task $SLURM_ARRAY_TASK_ID, run number $run #Do your stuff here done date
Deleting job arrays and tasks
To delete all of the tasks of an array job, use scancel
with the job ID:
scancel 292441
To delete a single task, add the task ID:
scancel 292441_5
Controlling Job emails
By default in SLURM, the emails for events BEGIN, END and FAIL apply to the job array as a whole rather than individual tasks. So:
#SBATCH --mail-type=BEGIN,END,FAIL
would result in one email per job, not per task. If you want per task emails, specify:
#SBATCH --mail-type=BEGIN,END,FAIL,ARRAY_TASKS
which will send emails for each task in the array.