Difference between revisions of "Sample SLURM Scripts"

From UFRC
Jump to navigation Jump to search
 
(104 intermediate revisions by 12 users not shown)
Line 1: Line 1:
[[Category:SLURM]]
+
[[Category:Scheduler]]
{{HPG2}}
+
Back to [[Slurm]]
 +
{|align=right
 +
  |__TOC__
 +
  |}
 +
Below are a number of sample scripts that can be used as a template for building your own SLURM submission scripts for use on HiPerGator 2.0. These scripts are also located at: /data/training/SLURM/, and can be copied from there. If you choose to copy one of these sample scripts, please make sure you understand what each <code>#SBATCH</code> directive means before using the script to submit your jobs.  Otherwise, you may not get the result you want and may waste valuable computing resources.
  
=Sample SLURM Scripts=
+
'''Note:''' There is a maximum limit of 3000 jobs per user.
  
Below are a number of sample scripts that can be used as a template for building your own SLURM submission scripts for use on HiPerGator 2.0. These scripts are also located at: /ufrc/data/training/SLURM/, and can be copied from there.
+
See [[Annotated SLURM Script]] for a step-by-step explanation of all options.
  
==Basic, single-processor job==
+
==Memory requests==
This script can serve as the template for many single-processor applications. The mem-per-cpu flag can be used to request the appropriate amount of memory for your job. Please make sure to test you application and set this value to a reasonable number based on actual memory use. The %j in the -o (can also use --output) line tells SLURM to substitute the job ID in the name of the output file. You can also add a -e or --error with an error file name to separate output and error logs.
+
A large number of users request far more memory than their jobs use (100-10,000 times!). As an example, since August 1st, looking at groups that have run over 1,000 jobs, there are 28 groups whose users have requested 100x the memory used in over half of those jobs. Groups often find themselves with jobs pending due to having reached their memory limits (QOSGrpMemLimit).
  
Download the [{{#fileLink: single_job}} single_processor_job.sh] script
+
While it is important to request more memory than will be used (10-20% is usually sufficient), requesting 100x, or even 10,000x, more memory only reduces the number of jobs that a group can run as well as overall throughput on the cluster. Many groups, and our overall user community, will be able to run far more jobs if they request more reasonable amounts of memory.
{{#fileAnchor: single_job}}
+
 
<source lang=bash>
+
The email sent when a job finishes shows users how much memory the job actually used and can be used to adjust memory requests for future jobs. The SLURM directives for memory requests are the --mem or --mem-per-cpu. It is in the user’s best interest to adjust the memory request to a more realistic value.
#!/bin/sh
+
 
#SBATCH --job-name=serial_job_test # Job name
+
Requesting more memory than needed will not speed up analyses. Based on their experience of finding their personal computers run faster when adding more memory, users often believe that requesting more memory will make their analyses run faster. This is not the case. An application running on the cluster will have access to all of the memory it requests, and we never swap RAM to disk. If an application can use more memory, it will get more memory. Only when the job crosses the limit based on the memory request does SLURM kill the job.
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
+
 
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail
+
==Basic, Single-Threaded Job==
#SBATCH --nodes=1 # Use one node
+
This script can serve as the template for many single-processor applications. The mem-per-cpu flag can be used to request the appropriate amount of memory for your job. Please make sure to test your application and set this value to a reasonable number based on actual memory use. The <code>%j</code> in the <code>--output</code> line tells SLURM to substitute the job ID in the name of the output file. You can also add a <code>-e</code> or <code>--error</code> line with an error file name to separate output and error logs.
#SBATCH --ntasks=1 # Run a single task
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
#SBATCH --mem-per-cpu=1gb # Memory per processor
+
''Expand to view example''
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
+
<div class="mw-collapsible-content" style="padding: 5px;">
#SBATCH --output=serial_test_%j.out # Standard output and error log
+
<pre>
 +
#!/bin/bash
 +
#SBATCH --job-name=serial_job_test   # Job name
 +
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
 +
#SBATCH --mail-user=email@ufl.edu    # Where to send mail
 +
#SBATCH --ntasks=1                   # Run on a single CPU
 +
#SBATCH --mem=1gb                     # Job memory request
 +
#SBATCH --time=00:05:00               # Time limit hrs:min:sec
 +
#SBATCH --output=serial_test_%j.log  # Standard output and error log
 
pwd; hostname; date
 
pwd; hostname; date
  
module load gcc/5.2.0 python/2.7.10
+
module load python
  
echo "Running plot program on a single CPU core"
+
echo "Running plot script on a single CPU core"
  
python /ufrc/data/training/SLURM/plot_template.py
+
python /data/training/SLURM/plot_template.py
  
 
date
 
date
</source>
+
</pre>
 +
</div>
 +
</div>
 +
==Multi Threaded Jobs vs Message Passing Interfaces==
 +
For information and examples on scripts that allow for multiple threads or communication between jobs, view [[Multi-Threaded & Message Passing Job Scripts]]
  
==Threaded or multi-processor job==
+
==Hybrid MPI/Threaded job==
This script can serve as a template for applications that are capable of using multiple processors on a single server or physical computer. These applications are commonly referred to as threaded, OpenMP, PTHREADS, or shared memory applications. While they can use multiple processors, they cannot make use of multiple servers and all the processors must be on the same node.
+
This script can serve as a template for hybrid MPI/SMP applications. These are MPI applications where each MPI process is multi-threaded (usually via either '''OpenMP''' or '''POSIX Threads''') and can use multiple processors.  
  
Download the [{{#fileLink: parallel_job}} multi_processor_job.sh] script
+
Our testing has found that it is best to be very specific about how you want your MPI ranks laid out across nodes and even sockets (multi-core CPUs). '''SLURM''' and '''OpenMPI''' have some conflicting behavior if you leave too much to chance. Please refer to the full '''SLURM''' ''sbatch'' documentation, as well as the information in the MPI example above.
{{#fileAnchor: parallel_job}}
 
<source lang=bash>
 
#!/bin/sh
 
#SBATCH --job-name=parallel_job_test # Job name
 
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
 
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail
 
#SBATCH --nodes=1 # Use one node
 
#SBATCH --ntasks=1 # Run a single task
 
#SBATCH --cpus-per-task=4 # Number of CPU cores per task
 
#SBATCH --mem-per-cpu=1gb # Memory per processor
 
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
 
#SBATCH --output=parallel_test_%j.out # Standard output and error log
 
pwd; hostname; date
 
  
echo "Running prime number generator program on $SLURM_CPUS_ON_NODE CPU cores"
+
The following example requests 8 tasks, each with 4 cores. It further specifies that these should be split evenly on 2 nodes, and within the nodes, the 4 tasks should be evenly split on the two sockets. So each CPU on the two nodes will have 2 tasks, each with 4 cores. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.
 
 
module load gcc/5.2.0
 
 
 
/ufrc/data/training/SLURM/prime/prime
 
  
 +
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 +
''Expand to see example.''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 +
<pre>
 +
#!/bin/bash
 +
#SBATCH --job-name=hybrid_job_test      # Job name
 +
#SBATCH --mail-type=END,FAIL            # Mail events (NONE, BEGIN, END, FAIL, ALL)
 +
#SBATCH --mail-user=email@ufl.edu      # Where to send mail
 +
#SBATCH --ntasks=8                      # Number of MPI ranks
 +
#SBATCH --cpus-per-task=4              # Number of cores per MPI rank
 +
#SBATCH --nodes=2                      # Number of nodes
 +
#SBATCH --ntasks-per-node=4            # How many tasks on each node
 +
#SBATCH --ntasks-per-socket=2          # How many tasks on each CPU or socket
 +
#SBATCH --mem-per-cpu=100mb            # Memory per core
 +
#SBATCH --time=00:05:00                # Time limit hrs:min:sec
 +
#SBATCH --output=hybrid_test_%j.log    # Standard output and error log
 +
pwd; hostname; date
 +
 +
module load  gcc/9.3.0  openmpi/4.1.1 raxml-ng/1.1.0
 +
 +
srun --mpi=$HPC_PMIX  raxml-ng ...
 +
 
date
 
date
</source>
+
</pre>
 +
</div>
 +
</div>
  
==MPI job==
+
The following example requests 8 tasks, each with 8 cores. It further specifies that these should be split evenly on 4 nodes, and within the nodes, the 2 tasks should be split, one on each of the two sockets. So each CPU on the two nodes will have 1 task, each with 8 cores. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.
This script can serve as a template for MPI, or message passing interface, applications. These are applications that can use multiple processors that may, or may not, be on multiple servers. In this example, each MPI rank has a single processor, and the number of nodes is not specified. This gives SLURM the most flexibility in finding resources for the job, but may not result in the best performance. Users can add the --nodes flag to specify the number of nodes they wish the job to use.
 
  
Download the [{{#fileLink: mpi_job}} mpi_job.sh] script
+
Also note setting OMP_NUM_THREADS so that OpenMP knows how many threads to use per task.
{{#fileAnchor: mpi_job}}
+
* Note that MPI gets -np from SLURM automatically.
<source lang=bash>
+
* Note there are many directives available to control processor layout.
#!/bin/sh
+
** Some to pay particular attention to are:
#SBATCH --job-name=mpi_job_test # Job name
+
*** --nodes if you care exactly how many nodes are used
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
+
*** --ntasks-per-node to limit number of tasks on a node
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail
+
*** --distribution one of several directives ([http://slurm.schedmd.com/sbatch.html see also] --contiguous, --cores-per-socket, --mem_bind, --ntasks-per-socket, --sockets-per-node) to control how tasks, cores and memory are distributed among nodes, sockets and cores. While SLURM will generally make appropriate decisions for setting up jobs, careful use of these directives can significantly enhance job performance and users are encouraged to profile application performance under different conditions.
#SBATCH --ntasks=4 # Number of MPI ranks
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
#SBATCH --cpus-per-task=1 # Number of cores per MPI rank  
+
''Expand to see example.''
#SBATCH --mem-per-cpu=1gb # Memory per processor
+
<div class="mw-collapsible-content" style="padding: 5px;">
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
+
<pre>
#SBATCH --output=mpi_test_%j.out # Standard output and error log
+
#!/bin/bash
pwd; hostname; date
+
#SBATCH --job-name=LAMMPS
 +
#SBATCH --output=LAMMPS_%j.out
 +
#SBATCH --mail-type=END,FAIL
 +
#SBATCH --mail-user=<email_address>
 +
#SBATCH --nodes=4              # Number of nodes
 +
#SBATCH --ntasks=8            # Number of MPI ranks
 +
#SBATCH --ntasks-per-node=2    # Number of MPI ranks per node
 +
#SBATCH --ntasks-per-socket=1  # Number of tasks per processor socket on the node
 +
#SBATCH --cpus-per-task=8      # Number of OpenMP threads for each MPI process/rank
 +
#SBATCH --mem-per-cpu=2000mb  # Per processor memory request
 +
#SBATCH --time=4-00:00:00     # Walltime in hh:mm:ss or d-hh:mm:ss
 +
date;hostname;pwd
  
echo "Running prime number generator program on $SLURM_JOB_NUM_NODES nodes with $SLURM_NTASKS tasks, each with $SLURM_CPUS_PER_TASK cores."
+
module load gcc/12.2.0 openmpi/4.1.5
  
module load intel/2016.0.109 openmpi/1.10.2
+
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
  
mpiexec /ufrc/data/training/SLURM/prime/prime_mpi
+
srun --mpi=$HPC_PMIX /path/to/app/lmp_gator2 < in.Cu.v.24nm.eq_xrd
  
 
date
 
date
</source>
+
</pre>
 +
</div>
 +
</div>
  
==Hybrid MPI/Threaded job==
+
==Array job==
This script can serve as a template for hybrid MPI/Threaded applications. These are MPI applications where each MPI rank is threaded and can use multiple processors. There are many options for specifying core layout and users are urged to consult the [http://slurm.schedmd.com/sbatch.html SLURM sbatch documentation] as well as test different options and their effect on performance.
+
Please see the [[SLURM Job Arrays]] page for information on job arrays. Note that we use the simplest 'single-threaded' process example from above and extending it to an array of jobs. Modify the following script using the parallel, mpi, or hybrid job layout as needed.
  
Download the [{{#fileLink: hybrid_job}} hybrid_job.sh] script
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
{{#fileAnchor: hybrid_job}}
+
''Expand to see script.''
<source lang=bash>
+
<div class="mw-collapsible-content" style="padding: 5px;">
#!/bin/sh
+
<pre>
#SBATCH --job-name=hybrid_job_test # Job name
+
#!/bin/bash
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
+
#SBATCH --job-name=array_job_test  # Job name
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail
+
#SBATCH --mail-type=FAIL            # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --ntasks=5 # Number of MPI ranks
+
#SBATCH --mail-user=email@ufl.edu  # Where to send mail
#SBATCH --cpus-per-task=5 # Number of cores per MPI rank
+
#SBATCH --ntasks=1                  # Run a single task
#SBATCH --mem-per-cpu=100mb # Memory per core
+
#SBATCH --mem=1gb                  # Job Memory
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
+
#SBATCH --time=00:05:00             # Time limit hrs:min:sec
#SBATCH --output=mpi_test_%j.out # Standard output and error log
+
#SBATCH --output=array_%A-%a.log    # Standard output and error log
 +
#SBATCH --array=1-5                # Array range
 
pwd; hostname; date
 
pwd; hostname; date
  
module load intel/2016.0.109 openmpi/1.10.2 raxml/8.2.8
+
echo This is task $SLURM_ARRAY_TASK_ID
 +
 
 +
date
 +
</pre>
 +
</div>
 +
</div>
 +
<br>
 +
Note the use of %A for the master job ID of the array, and the %a for the task ID in the output filename.
 +
 
 +
== GPU job ==
 +
Please see [[GPU Access]] for more information regarding the use of HiPerGator GPUs. Note that the order in which the environment modules are loaded is important.
  
mpiexec raxmlHPC-HYBRID-SSE3 -T $SLURM_CPUS_PER_TASK \
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
      -f a -m GTRGAMMA -s /ufrc/data/training/SLURM/dna.phy -p $RANDOM \
+
''Expand to view VASP script''
      -x $RANDOM -N 500 -n dna
+
<div class="mw-collapsible-content" style="padding: 5px;">
 +
<pre>
 +
#!/bin/bash
 +
#SBATCH --job-name=vasptest
 +
#SBATCH --output=vasp.out
 +
#SBATCH --error=vasp.err
 +
#SBATCH --mail-type=ALL
 +
#SBATCH --mail-user=email@ufl.edu
 +
#SBATCH --nodes=1
 +
#SBATCH --ntasks=8
 +
#SBATCH --cpus-per-task=1
 +
#SBATCH --ntasks-per-node=8
 +
#SBATCH --distribution=cyclic:cyclic
 +
#SBATCH --mem-per-cpu=7000mb
 +
#SBATCH --partition=gpu
 +
#SBATCH --gpus=a100:4
 +
#SBATCH --time=00:30:00
  
</source>
+
module purge
 +
module load cuda/12.2.0  intel/2020  openmpi/4.1.5 vasp/6.4.1
  
* Note that MPI gets -np from SLURM automatically.
+
srun --mpi=${HPC_PMIX} vasp_gpu
* Note there are many directives available to control processor layout.
 
** Some to pay particular attention to are:
 
*** --nodes if you care exactly how many nodes are used
 
*** --ntasks-per-node to limit number of tasks on a node
 
*** --distribution one of several directives ([http://slurm.schedmd.com/sbatch.html see also] --contiguous, --cores-per-socket, --mem_bind, --ntasks-per-socket, --sockets-per-node) to control how tasks, cores and memory are distributed among nodes, sockets and cores. While SLURM will generally make appropriate decisions for setting up jobs, careful use of these directives can significantly enhance job performance and users are encouraged to profile application performance under different conditions.
 
  
 +
</pre>
 +
</div>
 +
</div>
  
==Array Jobs==
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
Please see the [[SLURM_Job_Arrays]] page for information on job arrays. Note that we use the simplest 'single-threaded' process example from above and extending it to an array of jobs. Modify the following script using the parallel, mpi, or hybrid job layout as needed.
+
''Expand to view NAMD script ''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 +
<pre>
 +
#!/bin/bash
 +
#SBATCH --job-name=stmv
 +
#SBATCH --output=std.out
 +
#SBATCH --error=std.err
 +
#SBATCH --nodes=1
 +
#SBATCH --ntasks=1
 +
#SBATCH --ntasks-per-socket=1
 +
#SBATCH --cpus-per-task=4
 +
#SBATCH --distribution=block:block
 +
#SBATCH --time=30:00:00
 +
#SBATCH --mem-per-cpu=1gb
 +
#SBATCH --mail-type=NONE
 +
#SBATCH --mail-user=some_user@ufl.edu
 +
#SBATCH --partition=gpu
 +
#SBATCH --gpus=a100:2
  
Download the [{{#fileLink: array_job}} array_job.sh] script
+
module load cuda/11.0.207 intel/2020.0.166 namd/2.14b2
{{#fileAnchor: array_job}}
 
<source lang=bash>
 
#!/bin/sh
 
#SBATCH --job-name=array_job_test # Job name
 
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
 
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail
 
#SBATCH --nodes=1 # Use one node
 
#SBATCH --ntasks=1 # Run a single task
 
#SBATCH --mem-per-cpu=1gb # Memory per processor
 
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
 
#SBATCH --output=array_test_%A-$a.out # Standard output and error log
 
#SBATCH --array=1-5 # Array range
 
pwd; hostname; date
 
  
echo This is task $SLURM_ARRAY_TASK_ID
+
echo "NAMD2                = $(which namd2)"
 +
echo "SBATCH_CPU_BIND_LIST = $SBATCH_CPU_BIND_LIST"
 +
echo "SBATCH_CPU_BIND      = $SBATCH_CPU_BIND    "
 +
echo "CUDA_VISIBLE_DEVICES = $CUDA_VISIBLE_DEVICES"
 +
echo "SLURM_CPUS_PER_TASK  = $SLURM_CPUS_PER_TASK "
  
date
+
gpuList=$(echo $CUDA_VISIBLE_DEVICES | sed -e 's/,/ /g')
</source>
+
N=0
 +
devList=""
 +
for gpu in $gpuList
 +
do
 +
    devList="$devList $N"
 +
    N=$(($N + 1))
 +
done
 +
devList=$(echo $devList | sed -e 's/ /,/g')
 +
echo "devList = $devList"
  
Note the use of %A for the master job ID of the array, and the %a for the task ID in the output filename.
+
namd2 +p$SLURM_CPUS_PER_TASK +idlepoll +devices $devList stmv.namd
 +
</pre>
 +
</div>
 +
</div>

Latest revision as of 14:31, 10 July 2024

Back to Slurm

Below are a number of sample scripts that can be used as a template for building your own SLURM submission scripts for use on HiPerGator 2.0. These scripts are also located at: /data/training/SLURM/, and can be copied from there. If you choose to copy one of these sample scripts, please make sure you understand what each #SBATCH directive means before using the script to submit your jobs. Otherwise, you may not get the result you want and may waste valuable computing resources.

Note: There is a maximum limit of 3000 jobs per user.

See Annotated SLURM Script for a step-by-step explanation of all options.

Memory requests

A large number of users request far more memory than their jobs use (100-10,000 times!). As an example, since August 1st, looking at groups that have run over 1,000 jobs, there are 28 groups whose users have requested 100x the memory used in over half of those jobs. Groups often find themselves with jobs pending due to having reached their memory limits (QOSGrpMemLimit).

While it is important to request more memory than will be used (10-20% is usually sufficient), requesting 100x, or even 10,000x, more memory only reduces the number of jobs that a group can run as well as overall throughput on the cluster. Many groups, and our overall user community, will be able to run far more jobs if they request more reasonable amounts of memory.

The email sent when a job finishes shows users how much memory the job actually used and can be used to adjust memory requests for future jobs. The SLURM directives for memory requests are the --mem or --mem-per-cpu. It is in the user’s best interest to adjust the memory request to a more realistic value.

Requesting more memory than needed will not speed up analyses. Based on their experience of finding their personal computers run faster when adding more memory, users often believe that requesting more memory will make their analyses run faster. This is not the case. An application running on the cluster will have access to all of the memory it requests, and we never swap RAM to disk. If an application can use more memory, it will get more memory. Only when the job crosses the limit based on the memory request does SLURM kill the job.

Basic, Single-Threaded Job

This script can serve as the template for many single-processor applications. The mem-per-cpu flag can be used to request the appropriate amount of memory for your job. Please make sure to test your application and set this value to a reasonable number based on actual memory use. The %j in the --output line tells SLURM to substitute the job ID in the name of the output file. You can also add a -e or --error line with an error file name to separate output and error logs.

Expand to view example

#!/bin/bash
#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ufl.edu     # Where to send mail	
#SBATCH --ntasks=1                    # Run on a single CPU
#SBATCH --mem=1gb                     # Job memory request
#SBATCH --time=00:05:00               # Time limit hrs:min:sec
#SBATCH --output=serial_test_%j.log   # Standard output and error log
pwd; hostname; date

module load python

echo "Running plot script on a single CPU core"

python /data/training/SLURM/plot_template.py

date

Multi Threaded Jobs vs Message Passing Interfaces

For information and examples on scripts that allow for multiple threads or communication between jobs, view Multi-Threaded & Message Passing Job Scripts

Hybrid MPI/Threaded job

This script can serve as a template for hybrid MPI/SMP applications. These are MPI applications where each MPI process is multi-threaded (usually via either OpenMP or POSIX Threads) and can use multiple processors.

Our testing has found that it is best to be very specific about how you want your MPI ranks laid out across nodes and even sockets (multi-core CPUs). SLURM and OpenMPI have some conflicting behavior if you leave too much to chance. Please refer to the full SLURM sbatch documentation, as well as the information in the MPI example above.

The following example requests 8 tasks, each with 4 cores. It further specifies that these should be split evenly on 2 nodes, and within the nodes, the 4 tasks should be evenly split on the two sockets. So each CPU on the two nodes will have 2 tasks, each with 4 cores. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.

Expand to see example.

#!/bin/bash
#SBATCH --job-name=hybrid_job_test      # Job name
#SBATCH --mail-type=END,FAIL            # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ufl.edu       # Where to send mail	
#SBATCH --ntasks=8                      # Number of MPI ranks
#SBATCH --cpus-per-task=4               # Number of cores per MPI rank 
#SBATCH --nodes=2                       # Number of nodes
#SBATCH --ntasks-per-node=4             # How many tasks on each node
#SBATCH --ntasks-per-socket=2           # How many tasks on each CPU or socket
#SBATCH --mem-per-cpu=100mb             # Memory per core
#SBATCH --time=00:05:00                 # Time limit hrs:min:sec
#SBATCH --output=hybrid_test_%j.log     # Standard output and error log
pwd; hostname; date
 
module load  gcc/9.3.0  openmpi/4.1.1 raxml-ng/1.1.0
 
srun --mpi=$HPC_PMIX  raxml-ng ...
 
date

The following example requests 8 tasks, each with 8 cores. It further specifies that these should be split evenly on 4 nodes, and within the nodes, the 2 tasks should be split, one on each of the two sockets. So each CPU on the two nodes will have 1 task, each with 8 cores. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.

Also note setting OMP_NUM_THREADS so that OpenMP knows how many threads to use per task.

  • Note that MPI gets -np from SLURM automatically.
  • Note there are many directives available to control processor layout.
    • Some to pay particular attention to are:
      • --nodes if you care exactly how many nodes are used
      • --ntasks-per-node to limit number of tasks on a node
      • --distribution one of several directives (see also --contiguous, --cores-per-socket, --mem_bind, --ntasks-per-socket, --sockets-per-node) to control how tasks, cores and memory are distributed among nodes, sockets and cores. While SLURM will generally make appropriate decisions for setting up jobs, careful use of these directives can significantly enhance job performance and users are encouraged to profile application performance under different conditions.

Expand to see example.

#!/bin/bash
#SBATCH --job-name=LAMMPS
#SBATCH --output=LAMMPS_%j.out
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=<email_address>
#SBATCH --nodes=4              # Number of nodes
#SBATCH --ntasks=8             # Number of MPI ranks
#SBATCH --ntasks-per-node=2    # Number of MPI ranks per node
#SBATCH --ntasks-per-socket=1  # Number of tasks per processor socket on the node
#SBATCH --cpus-per-task=8      # Number of OpenMP threads for each MPI process/rank
#SBATCH --mem-per-cpu=2000mb   # Per processor memory request
#SBATCH --time=4-00:00:00      # Walltime in hh:mm:ss or d-hh:mm:ss
date;hostname;pwd

module load gcc/12.2.0 openmpi/4.1.5

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun --mpi=$HPC_PMIX /path/to/app/lmp_gator2 < in.Cu.v.24nm.eq_xrd

date

Array job

Please see the SLURM Job Arrays page for information on job arrays. Note that we use the simplest 'single-threaded' process example from above and extending it to an array of jobs. Modify the following script using the parallel, mpi, or hybrid job layout as needed.

Expand to see script.

#!/bin/bash
#SBATCH --job-name=array_job_test   # Job name
#SBATCH --mail-type=FAIL            # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ufl.edu   # Where to send mail	
#SBATCH --ntasks=1                  # Run a single task
#SBATCH --mem=1gb                   # Job Memory
#SBATCH --time=00:05:00             # Time limit hrs:min:sec
#SBATCH --output=array_%A-%a.log    # Standard output and error log
#SBATCH --array=1-5                 # Array range
pwd; hostname; date

echo This is task $SLURM_ARRAY_TASK_ID

date


Note the use of %A for the master job ID of the array, and the %a for the task ID in the output filename.

GPU job

Please see GPU Access for more information regarding the use of HiPerGator GPUs. Note that the order in which the environment modules are loaded is important.

Expand to view VASP script

#!/bin/bash
#SBATCH --job-name=vasptest
#SBATCH --output=vasp.out
#SBATCH --error=vasp.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@ufl.edu
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=8
#SBATCH --distribution=cyclic:cyclic
#SBATCH --mem-per-cpu=7000mb
#SBATCH --partition=gpu
#SBATCH --gpus=a100:4
#SBATCH --time=00:30:00

module purge
module load cuda/12.2.0  intel/2020  openmpi/4.1.5 vasp/6.4.1

srun --mpi=${HPC_PMIX} vasp_gpu

Expand to view NAMD script

#!/bin/bash
#SBATCH --job-name=stmv
#SBATCH --output=std.out
#SBATCH --error=std.err
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --ntasks-per-socket=1
#SBATCH --cpus-per-task=4
#SBATCH --distribution=block:block
#SBATCH --time=30:00:00
#SBATCH --mem-per-cpu=1gb
#SBATCH --mail-type=NONE
#SBATCH --mail-user=some_user@ufl.edu
#SBATCH --partition=gpu
#SBATCH --gpus=a100:2

module load cuda/11.0.207 intel/2020.0.166 namd/2.14b2

echo "NAMD2                = $(which namd2)"
echo "SBATCH_CPU_BIND_LIST = $SBATCH_CPU_BIND_LIST"
echo "SBATCH_CPU_BIND      = $SBATCH_CPU_BIND     "
echo "CUDA_VISIBLE_DEVICES = $CUDA_VISIBLE_DEVICES"
echo "SLURM_CPUS_PER_TASK  = $SLURM_CPUS_PER_TASK "

gpuList=$(echo $CUDA_VISIBLE_DEVICES | sed -e 's/,/ /g')
N=0
devList=""
for gpu in $gpuList
do
    devList="$devList $N"
    N=$(($N + 1))
done
devList=$(echo $devList | sed -e 's/ /,/g')
echo "devList = $devList"

namd2 +p$SLURM_CPUS_PER_TASK +idlepoll +devices $devList stmv.namd