Difference between revisions of "Slurm Cron Jobs"

From UFRC
Jump to navigation Jump to search
Line 15: Line 15:
 
==Set up or Edit Your scrontab==
 
==Set up or Edit Your scrontab==
 
Run scrontab -e to edit your scrontab file. The default editor for scrontab is vi but you can specify your favorate editor, for example, if you prefer to use nano to edit files, run:
 
Run scrontab -e to edit your scrontab file. The default editor for scrontab is vi but you can specify your favorate editor, for example, if you prefer to use nano to edit files, run:
EDITOR=nano scrontab -e
+
 
 +
<nowiki>EDITOR=nano scrontab -e</nowiki>
  
 
You can also define the environmental parameter EDITOR to change the default editor, for example:
 
You can also define the environmental parameter EDITOR to change the default editor, for example:
  
export EDITOR=/usr/bin/nano
+
<nowiki>export EDITOR=/usr/bin/nano</nowiki>
  
 
In scrontab the lines that start with #SCRON are treated like the beginning of a new batch job, and work like #SBATCH directives for batch jobs. Slurm will ignore #SBATCH directives in scripts you run as scrontab jobs. You can use most common sbatch options just as you would using sbatch on the command line. The first line after your SCRON directives specifies the schedule for your job and the command to run.
 
In scrontab the lines that start with #SCRON are treated like the beginning of a new batch job, and work like #SBATCH directives for batch jobs. Slurm will ignore #SBATCH directives in scripts you run as scrontab jobs. You can use most common sbatch options just as you would using sbatch on the command line. The first line after your SCRON directives specifies the schedule for your job and the command to run.
Line 43: Line 44:
 
''Your scrontab jobs will appear to have the same JobID every time they run until the next time you edit your scrontab file (they are being requeued). This means that only the most recent job will be logged to the default output file. If you want deeper history, you should redirect output in your scripts to filenames with something more unique in their names, like a date or timestamp, e.g.
 
''Your scrontab jobs will appear to have the same JobID every time they run until the next time you edit your scrontab file (they are being requeued). This means that only the most recent job will be logged to the default output file. If you want deeper history, you should redirect output in your scripts to filenames with something more unique in their names, like a date or timestamp, e.g.
  
python my_script.py > $(date +"%Y-%m-%d")_myjob_scrontab.out
+
<nowiki>python my_script.py > $(date +"%Y-%m-%d")_myjob_scrontab.out</nowiki>
  
 
If you want to see slurm accounting of a job handled by scrontab, for example job 12345 run:
 
If you want to see slurm accounting of a job handled by scrontab, for example job 12345 run:
  
sacct --duplicates --jobs 12345
+
<nowiki>sacct --duplicates --jobs 12345</nowiki>
# or with short options
+
 
sacct -Dj 12345''
+
or with short options:
 +
 
 +
<nowiki>sacct -Dj 12345''</nowiki>
  
  
Line 59: Line 62:
  
  
#SCRON --time 6:00:00
+
<nowiki>#SCRON --time 6:00:00
 
#SCRON --cpus-per-task 4
 
#SCRON --cpus-per-task 4
 
#SCRON --name "daily_test"
 
#SCRON --name "daily_test"
Line 65: Line 68:
 
#SCRON -o myoutput/%j-out.txt
 
#SCRON -o myoutput/%j-out.txt
 
@daily ./mytest.sh
 
@daily ./mytest.sh
 
+
</nowiki>
  
 
Run a Weekly Transfer Job
 
Run a Weekly Transfer Job
Line 71: Line 74:
  
  
#SCRON --time 1:00:00
+
<nowiki>#SCRON --time 1:00:00
 
#SCRON --partition test
 
#SCRON --partition test
 
#SCRON --chdir /home/myusername/test
 
#SCRON --chdir /home/myusername/test
 
#SCRON -o test_log_%j.txt
 
#SCRON -o test_log_%j.txt
0 20 * * 3 ./mytest.sh
+
0 20 * * 3 ./mytest.sh</nowiki>
 +
 
 
Capture output from each run in a separate file
 
Capture output from each run in a separate file
 
Normally scrontab will clobber the output file from the previous run on each execution, since each execution uses the same jobid. This can be avoided using a redirect to a date-stamped file like:
 
Normally scrontab will clobber the output file from the previous run on each execution, since each execution uses the same jobid. This can be avoided using a redirect to a date-stamped file like:
  
 
+
<nowiki>0 20 * * 3 ./mytest.sh > myjob_$(date +%Y%m%d%H%M).out</nowiki>
0 20 * * 3 ./mytest.sh > myjob_$(date +%Y%m%d%H%M).out
 
  
  
Line 86: Line 89:
 
You can monitor your scrontab jobs with
 
You can monitor your scrontab jobs with
  
squeue --me -q cron -O JobID,EligibleTime
+
<nowiki>squeue --me -q cron -O JobID,EligibleTime</nowiki>
  
 
This will show the next time the batch system will run your job. If the scrontab job is set to repeat, the system will automatically reschedule the next job. Additionally, if you modify your scrontab job, slurm will automatically cancel the old job and resubmit an new one.
 
This will show the next time the batch system will run your job. If the scrontab job is set to repeat, the system will automatically reschedule the next job. Additionally, if you modify your scrontab job, slurm will automatically cancel the old job and resubmit an new one.
Line 94: Line 97:
 
To remove a scontab job from your running jobs you can edit the scontab file with scrontab -e and comment out all the lines associated with the entry.
 
To remove a scontab job from your running jobs you can edit the scontab file with scrontab -e and comment out all the lines associated with the entry.
  
Using scancel on a scontab job
+
Using scancel on a scontab job:
  
 
The scancel command will give a warning when attempting to remove a job started with scrontab.
 
The scancel command will give a warning when attempting to remove a job started with scrontab.
  
$ scancel 12345
+
<nowiki>$ scancel 12345</nowiki>
 
scancel: error: Kill job error on job id 12345: Cannot scancel a scrontab job without the --hurry flag, or modify scrontab jobs through scontrol
 
scancel: error: Kill job error on job id 12345: Cannot scancel a scrontab job without the --hurry flag, or modify scrontab jobs through scontrol
 
By canceling a scontab job with the --hurry flag, the entry in the scrontab file will be prepended with #DISABLED. These comments will needs to be removed before the job will be able to start again.
 
By canceling a scontab job with the --hurry flag, the entry in the scrontab file will be prepended with #DISABLED. These comments will needs to be removed before the job will be able to start again.

Revision as of 16:59, 17 April 2024

On HiPerGator you can use SCRON (Slrum CRON) to schedule periodically occuring jobs in Slurm. SCRON uses a syntax similar to the traditional Unix/Linux CRON jobs utilities.

SCRON combines the same functionality as cron with the resiliency of the batch system. Jobs are run on a cluster of nodes, so unlike with regular cron, a single node going down won't keep your SCRON job from running. You can also find and modify your SCRON jobs on any login node.

List Your current scrontab

You can view your existing scripts with

scrontab -l

Set up or Edit Your scrontab

Run scrontab -e to edit your scrontab file. The default editor for scrontab is vi but you can specify your favorate editor, for example, if you prefer to use nano to edit files, run:

EDITOR=nano scrontab -e

You can also define the environmental parameter EDITOR to change the default editor, for example:

export EDITOR=/usr/bin/nano

In scrontab the lines that start with #SCRON are treated like the beginning of a new batch job, and work like #SBATCH directives for batch jobs. Slurm will ignore #SBATCH directives in scripts you run as scrontab jobs. You can use most common sbatch options just as you would using sbatch on the command line. The first line after your SCRON directives specifies the schedule for your job and the command to run.

Note: All of your scrontab jobs will start with your home directory as the working directory. You can change this with the --chdir slurm option.


Crontab syntax

Crontab syntax is specified in five columns, to specify minutes, hours, days of the month, months, and days of the week. Especially at first you may find it easiest to use a helper application to generate your cron date fields, such as crontab-generator or cronhub.io. You can also use the short-hand syntax @hourly, @daily, @weekly, @monthly, and @yearly instead of the five separate columns.


What to Run

If you're running a script it must be marked as executable. Jobs handled by scrontab do not run in a full login shell, so if you have customized your .bashrc file you need to add:

source ~/.bashrc To your script to ensure that your environment is set up correctly.

Note: The command you specify in the scrontab is executed via bash, NOT sbatch. You can list multiple commands separated by ;, and use other shell features, such as redirects. Also, any #SBATCH directives in executed scripts will be ignored. You must use #SCRON in the scrontab file instead.

Note: Your scrontab jobs will appear to have the same JobID every time they run until the next time you edit your scrontab file (they are being requeued). This means that only the most recent job will be logged to the default output file. If you want deeper history, you should redirect output in your scripts to filenames with something more unique in their names, like a date or timestamp, e.g.

python my_script.py > $(date +"%Y-%m-%d")_myjob_scrontab.out

If you want to see slurm accounting of a job handled by scrontab, for example job 12345 run:

sacct --duplicates --jobs 12345

or with short options:

sacct -Dj 12345''



Scrontab Examples

This example submits a 6-hour test job eligible to start every day at 12:00 AM.


#SCRON --time 6:00:00 #SCRON --cpus-per-task 4 #SCRON --name "daily_test" #SCRON --chdir /home/myusername/test #SCRON -o myoutput/%j-out.txt @daily ./mytest.sh

Run a Weekly Transfer Job This example submits a transfer script eligible to start every Wednesday at 8:00 PM.


#SCRON --time 1:00:00 #SCRON --partition test #SCRON --chdir /home/myusername/test #SCRON -o test_log_%j.txt 0 20 * * 3 ./mytest.sh

Capture output from each run in a separate file Normally scrontab will clobber the output file from the previous run on each execution, since each execution uses the same jobid. This can be avoided using a redirect to a date-stamped file like:

0 20 * * 3 ./mytest.sh > myjob_$(date +%Y%m%d%H%M).out


Monitoring Your Scrontab Jobs

You can monitor your scrontab jobs with

squeue --me -q cron -O JobID,EligibleTime

This will show the next time the batch system will run your job. If the scrontab job is set to repeat, the system will automatically reschedule the next job. Additionally, if you modify your scrontab job, slurm will automatically cancel the old job and resubmit an new one.


Canceling a Scrontab job

To remove a scontab job from your running jobs you can edit the scontab file with scrontab -e and comment out all the lines associated with the entry.

Using scancel on a scontab job:

The scancel command will give a warning when attempting to remove a job started with scrontab.

$ scancel 12345 scancel: error: Kill job error on job id 12345: Cannot scancel a scrontab job without the --hurry flag, or modify scrontab jobs through scontrol By canceling a scontab job with the --hurry flag, the entry in the scrontab file will be prepended with #DISABLED. These comments will needs to be removed before the job will be able to start again.