Difference between revisions of "Slurm Cron Jobs"

From UFRC
Jump to navigation Jump to search
 
(13 intermediate revisions by 2 users not shown)
Line 20: Line 20:
 
  EDITOR=nano scrontab -e  
 
  EDITOR=nano scrontab -e  
  
You can also define the environmental parameter EDITOR to change the default editor before launching "scrontab -e", for example:
+
You can also define the environmental parameter EDITOR to change the default editor in prior to launching "scrontab -e", for example:
  
 
  export EDITOR=/usr/bin/nano  
 
  export EDITOR=/usr/bin/nano  
Line 50: Line 50:
 
  sacct --duplicates --jobs 12345  
 
  sacct --duplicates --jobs 12345  
  
or with short options:
+
''or with short options:''
  
 
  sacct -Dj 12345''
 
  sacct -Dj 12345''
Line 56: Line 56:
 
==Scrontab Examples==
 
==Scrontab Examples==
  
This example submits a 6-hour test job eligible to start every day at 12:00 AM.
+
This example submits a 6-hour test job eligible to start every day at 12:00 AM:
  
 
  #SCRON --time 6:00:00
 
  #SCRON --time 6:00:00
Line 65: Line 65:
 
  @daily ./mytest.sh
 
  @daily ./mytest.sh
  
The following example submits a test script eligible to start every Wednesday at 8:00 PM.
+
The following example runs a test script eligible to start every Wednesday at 8:00 PM:
  
 
  #SCRON --time 1:00:00
 
  #SCRON --time 1:00:00
Line 71: Line 71:
 
  #SCRON --chdir /home/myusername/test
 
  #SCRON --chdir /home/myusername/test
 
  #SCRON -o test_log_%j.txt
 
  #SCRON -o test_log_%j.txt
  0 20 * * 3 ./mytest.sh  
+
  0 20 * * 3 ./mytest.sh
  
==Monitoring Your Scrontab Jobs==
+
The example below checks every hour whether an instance of the test job is running, and if not, it will start it. This avoids multiple instances of the same job from running by using the "--dependency=singleton" option in the scrontab:
 +
 
 +
#SCRON --qos=mygroup
 +
#SCRON --account=myaccount
 +
#SCRON --time=30-00:00:00
 +
#SCRON --dependency=singleton
 +
#SCRON --name=mytest
 +
0 * * * * ./mytest.sh
 +
 
 +
==Monitor Your Scrontab Jobs==
 
You can monitor your scrontab jobs with
 
You can monitor your scrontab jobs with
  
Line 80: Line 89:
 
This will show the next time the batch system will run your job. If the scrontab job is set to repeat, the system will automatically reschedule the next job. Additionally, if you modify your scrontab job, slurm will automatically cancel the old job and resubmit an new one.
 
This will show the next time the batch system will run your job. If the scrontab job is set to repeat, the system will automatically reschedule the next job. Additionally, if you modify your scrontab job, slurm will automatically cancel the old job and resubmit an new one.
  
==Canceling a Scrontab job==
+
===Email notifications===
To remove a scontab job from your running jobs you can edit the scontab file with scrontab -e and comment out all the lines associated with the entry.
+
If you prefer email notifications for your SCRON jobs, those can be enabled in a similar fashion to other slurm jobs.  However, if the job is run frequently, this can lead to lots of email messages and they might have issues with various email providers.  To reduce email notifications for frequently running jobs, customers can either determine a limited condition on when the email is sent (i.e. #SCRON --mail-type=FAIL) or set up a job log.
 +
 
 +
One of the easiest ways to setup a job log is to add a logging line to end of your SCRON script:
 +
 
 +
<pre>
 +
echo "Closing timestamp: `date`" >> $HOME/scron/logs/testlogs.txt
 +
</pre>
 +
 
 +
This line will add a closing timestamp to a log file located in the customer's home directory every time the job runs.  This provides a log of when the script ran correctly and when used in combination with the limited email condition above, can reduce email from scripts that run frequently.
  
Using scancel on a scontab job:
+
==Cancel a Scrontab Job==
 +
To remove a scontab job from your running jobs you can edit the scontab file with "scrontab -e" and comment out all the lines associated with the entry.
  
The scancel command will give a warning when attempting to remove a job started with scrontab.
+
On the other hand, using scancel command to remove a job started with scrontab will give a warning like:
  
 
  $ scancel 12345  
 
  $ scancel 12345  
 
  scancel: error: Kill job error on job id 12345: Cannot scancel a scrontab job without the --hurry flag, or modify scrontab jobs through scontrol
 
  scancel: error: Kill job error on job id 12345: Cannot scancel a scrontab job without the --hurry flag, or modify scrontab jobs through scontrol
  
By canceling a scontab job with the --hurry flag, the entry in the scrontab file will be prepended with #DISABLED. These comments will needs to be removed before the job will be able to start again.
+
By canceling a scontab job with the "--hurry" flag, the entry in the scrontab file will be prepended with #DISABLED. These comments will need to be removed before the job can start again.

Latest revision as of 20:22, 6 August 2024

On HiPerGator you can use SCRON (Slrum CRON) to schedule periodically occuring jobs in Slurm. SCRON uses a syntax similar to the traditional Unix/Linux CRON jobs utilities.

SCRON combines the same functionality as cron with the resiliency of the batch system. Jobs are run on a cluster of nodes, so unlike with regular cron, a single node going down won't keep your SCRON job from running. You can also find and modify your SCRON jobs on any login node.

SCRON jobs are mananged by scrontab for each user.

List Your Current Scrontab

You can view your existing scripts (if any) in scrontab with:

scrontab -l

Set Up or Edit Your Scrontab

Run "scrontab -e" to create or edit your scrontab file. The default editor for scrontab is vi but you can specify your favorite editor, for example if you prefer to use nano to edit files, run:

EDITOR=nano scrontab -e 

You can also define the environmental parameter EDITOR to change the default editor in prior to launching "scrontab -e", for example:

export EDITOR=/usr/bin/nano 

In scrontab the lines that start with #SCRON are treated like the beginning of a new batch job, and work like #SBATCH directives for regular Slurm batch jobs. Slurm will ignore #SBATCH directives in scripts you run as scrontab jobs. You can use most of the common sbatch options just as you would using sbatch on the command line. The first line after your SCRON directives specifies the schedule for your job and the command to run.

Note: All of your scrontab jobs will start with your home directory as the working directory. You can change this with the --chdir slurm option.


Crontab syntax as used in regular Cron jobs and Scrontab is specified in five columns, to specify minutes, hours, days of the month, months, and days of the week. Especially at first you may find it easiest to use a helper application to generate your cron date fields, such as crontab-generator or cronhub.io. You can also use the short-hand syntax @hourly, @daily, @weekly, @monthly, and @yearly instead of the five separate columns.

Normally scrontab will clobber the output file from the previous run on each execution, since each execution uses the same jobid. This can be avoided using a redirect to a date-stamped file like:

0 20 * * 3 ./mytest.sh > myjob_$(date +%Y%m%d%H%M).out

If you're running a script in scrontab it must be marked as executable in its permission settings. Jobs handled by scrontab do not run in a full login shell, so if you have customized your .bashrc file you need to add:

source ~/.bashrc

to your script to ensure that your environment is set up correctly.

Note: The command you specify in the scrontab is executed via bash, NOT sbatch. You can list multiple commands separated by ;, and use other shell features, such as redirects. Also, any #SBATCH directives in executed scripts will be ignored. You must use #SCRON in the scrontab file instead.

Note: If you want to see slurm accounting of a job handled by scrontab, for example job 12345 run:

sacct --duplicates --jobs 12345 

or with short options:

sacct -Dj 12345

Scrontab Examples

This example submits a 6-hour test job eligible to start every day at 12:00 AM:

#SCRON --time 6:00:00
#SCRON --cpus-per-task 4
#SCRON --name "daily_test"
#SCRON --chdir /home/myusername/test
#SCRON -o myoutput/%j-out.txt
@daily ./mytest.sh

The following example runs a test script eligible to start every Wednesday at 8:00 PM:

#SCRON --time 1:00:00
#SCRON --partition test
#SCRON --chdir /home/myusername/test
#SCRON -o test_log_%j.txt
0 20 * * 3 ./mytest.sh

The example below checks every hour whether an instance of the test job is running, and if not, it will start it. This avoids multiple instances of the same job from running by using the "--dependency=singleton" option in the scrontab:

#SCRON --qos=mygroup
#SCRON --account=myaccount
#SCRON --time=30-00:00:00
#SCRON --dependency=singleton
#SCRON --name=mytest
0 * * * * ./mytest.sh

Monitor Your Scrontab Jobs

You can monitor your scrontab jobs with

squeue --me -q cron -O JobID,EligibleTime 

This will show the next time the batch system will run your job. If the scrontab job is set to repeat, the system will automatically reschedule the next job. Additionally, if you modify your scrontab job, slurm will automatically cancel the old job and resubmit an new one.

Email notifications

If you prefer email notifications for your SCRON jobs, those can be enabled in a similar fashion to other slurm jobs. However, if the job is run frequently, this can lead to lots of email messages and they might have issues with various email providers. To reduce email notifications for frequently running jobs, customers can either determine a limited condition on when the email is sent (i.e. #SCRON --mail-type=FAIL) or set up a job log.

One of the easiest ways to setup a job log is to add a logging line to end of your SCRON script:

echo "Closing timestamp: `date`" >> $HOME/scron/logs/testlogs.txt

This line will add a closing timestamp to a log file located in the customer's home directory every time the job runs. This provides a log of when the script ran correctly and when used in combination with the limited email condition above, can reduce email from scripts that run frequently.

Cancel a Scrontab Job

To remove a scontab job from your running jobs you can edit the scontab file with "scrontab -e" and comment out all the lines associated with the entry.

On the other hand, using scancel command to remove a job started with scrontab will give a warning like:

$ scancel 12345 
scancel: error: Kill job error on job id 12345: Cannot scancel a scrontab job without the --hurry flag, or modify scrontab jobs through scontrol

By canceling a scontab job with the "--hurry" flag, the entry in the scrontab file will be prepended with #DISABLED. These comments will need to be removed before the job can start again.