Difference between revisions of "Slurm and GPU Use"

From UFRC
Jump to navigation Jump to search
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
+
[[Category:Scheduler]][[Category:GPU]]
 +
See also: [[GPU Access]]
 
{|align=right
 
{|align=right
 
   |__TOC__
 
   |__TOC__
 
   |}
 
   |}
== Policy ==
+
After purchase, NGU allocations are included in your groups resources (quality of service).  
* After purchase, NGU allocations are included in your groups resources (quality of service).  
+
* To increase the availability of GPU resources, the time limit for the gpu partition is 14-days (at most <code>#SBATCH --time=14-00:00:00</code>). If you have a workload requiring more time, please create a [https://support.rc.ufl.edu/enter_bug.cgi help request].
* To increase the availability of GPU resources, the time limit for the gpu partition is 7-days (at most <code>#SBATCH --time=7-00:00:00</code>). If you have a workload requiring more time, please create a [https://support.rc.ufl.edu/enter_bug.cgi help request].
 
  
 
==Interactive Access ==
 
==Interactive Access ==
In order to request interactive command line access to a GPU under SLURM, use commands similar to these:  
+
Interactive sessions are limited to 12 hours. In order to request interactive command line access to a GPU under SLURM, use commands similar to these:  
 
+
<div style="column-count:2">
 
*To request access to one GPU (of any type) for a default 1 hour session:
 
*To request access to one GPU (of any type) for a default 1 hour session:
 
  srun -p gpu --gpus=1 --pty -u bash -i
 
  srun -p gpu --gpus=1 --pty -u bash -i
 
*To request access to two A100 GPUs on a single node for a 3-hour session with 300gb RAM:
 
*To request access to two A100 GPUs on a single node for a 3-hour session with 300gb RAM:
 
  srun -p gpu --nodes=1 --gpus=a100:2 --time=03:00:00 --mem=300gb  --pty -u bash -i
 
  srun -p gpu --nodes=1 --gpus=a100:2 --time=03:00:00 --mem=300gb  --pty -u bash -i
 +
</div>
 
*To request access to two GeForce GPUs with multiple CPUs:
 
*To request access to two GeForce GPUs with multiple CPUs:
 
  srun -p gpu --nodes=1 --gpus=geforce:2 --time=01:00:00 --ntasks=1 --cpus-per-task=8 --mem 300gb  --pty -u bash -i
 
  srun -p gpu --nodes=1 --gpus=geforce:2 --time=01:00:00 --ntasks=1 --cpus-per-task=8 --mem 300gb  --pty -u bash -i
 
Interactive sessions are limited to 12 hours.
 
  
 
==Open On Demand Access ==
 
==Open On Demand Access ==
Line 37: Line 36:
 
== Batch Jobs ==
 
== Batch Jobs ==
 
For batch jobs, to request GPU resources, use lines similar to the following:  
 
For batch jobs, to request GPU resources, use lines similar to the following:  
 
+
<div style="column-count:2">
 
*In this example, two A100 GPUs on a single server (--nodes defaults to "1") will be allocated to the job:
 
*In this example, two A100 GPUs on a single server (--nodes defaults to "1") will be allocated to the job:
 
<pre>
 
<pre>
Line 49: Line 48:
 
#SBATCH --gpus=geforce:2
 
#SBATCH --gpus=geforce:2
 
</pre>
 
</pre>
 
+
</div>
  
 
Alternatively, use '<code>--gres=gpu:1</code>' or '<code>--gres=gpu:geforce:1</code>' format. Note, if '--gpus=' format is used SLURM will not provide the data on GPU usage to slurmInfo and those GPUs will not be shown in slurmInfo output.
 
Alternatively, use '<code>--gres=gpu:1</code>' or '<code>--gres=gpu:geforce:1</code>' format. Note, if '--gpus=' format is used SLURM will not provide the data on GPU usage to slurmInfo and those GPUs will not be shown in slurmInfo output.

Latest revision as of 18:04, 27 June 2024

See also: GPU Access

After purchase, NGU allocations are included in your groups resources (quality of service).

  • To increase the availability of GPU resources, the time limit for the gpu partition is 14-days (at most #SBATCH --time=14-00:00:00). If you have a workload requiring more time, please create a help request.

Interactive Access

Interactive sessions are limited to 12 hours. In order to request interactive command line access to a GPU under SLURM, use commands similar to these:

  • To request access to one GPU (of any type) for a default 1 hour session:
srun -p gpu --gpus=1 --pty -u bash -i
  • To request access to two A100 GPUs on a single node for a 3-hour session with 300gb RAM:
srun -p gpu --nodes=1 --gpus=a100:2 --time=03:00:00 --mem=300gb  --pty -u bash -i
  • To request access to two GeForce GPUs with multiple CPUs:
srun -p gpu --nodes=1 --gpus=geforce:2 --time=01:00:00 --ntasks=1 --cpus-per-task=8 --mem 300gb  --pty -u bash -i

Open On Demand Access

To access GPUs using Open-On-Demand, please check the form for your application. If your application supports multiple GPU types, choose the GPU partition and specify number of GPUs and type:

  • To request access to one GPU (of any type, use this gres string):
gpu:1
  • To request multiple GPUs (of any type, use this gres string were n is the number of GPUs you need):
gpu:n
  • To request a specific type of GPU, use this gres string (requesting geforce GPUs in this example):
gpu:geforce:1
  • To request a A100 GPU, use this gres string:
gpu:a100:1

Batch Jobs

For batch jobs, to request GPU resources, use lines similar to the following:

  • In this example, two A100 GPUs on a single server (--nodes defaults to "1") will be allocated to the job:
#SBATCH --partition=gpu
#SBATCH --gpus=a100:2
  • In this example, two GeForce GPUs on a single server (--nodes defaults to "1") will be allocated to the job:
#SBATCH --partition=gpu
#SBATCH --gpus=geforce:2

Alternatively, use '--gres=gpu:1' or '--gres=gpu:geforce:1' format. Note, if '--gpus=' format is used SLURM will not provide the data on GPU usage to slurmInfo and those GPUs will not be shown in slurmInfo output.

If no GPUs are available, your request will be queued and your connection established once the next GPU becomes available. Otherwise, you may cancel your job and try lowering requested resources. If you have requested a longer time than is needed, please be sure to end your session so that the GPU will be available for other users.

SLURM Options for A100 GPUs

To use A100 GPUs for interactive sessions or batch jobs, please use one of the following SLURM parameters:

--partition=gpu
--gpus=a100:2

Job Script Example

This is a sample script for MPI parallel VASP job requesting and using GPUs under SLURM:

Expand to view a sample script

#!/bin/bash
#SBATCH --job-name=vasptest
#SBATCH --output=vasp.out
#SBATCH --error=vasp.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@ufl.edu
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=8
#SBATCH --ntasks-per-socket=4
#SBATCH --mem-per-cpu=7000mb
#SBATCH --distribution=cyclic:cyclic
#SBATCH --partition=gpu
#SBATCH --gres=gpu:geforce:4
#SBATCH --time=00:30:00

echo "Date      = $(date)"
echo "host      = $(hostname -s)"
echo "Directory = $(pwd)"

module purge
module load cuda/10.0.130  intel/2018  openmpi/4.0.0 vasp/5.4.4

T1=$(date +%s)
srun --mpi=pmix_v3 vasp_gpu
T2=$(date +%s)

ELAPSED=$((T2 - T1))
echo "Elapsed Time = $ELAPSED"