Difference between revisions of "GPU Access"

From UFRC
Jump to navigation Jump to search
 
(40 intermediate revisions by 10 users not shown)
Line 1: Line 1:
[[Category:Scheduler]]
+
[[Category:Scheduler]][[Category:GPU]]
 +
{|align=right
 +
  |__TOC__
 +
  |}
 +
==GPUs Per Group==
  
{{Note|Interactive Jobs in the GPU partition are limited to 6 hrs|warn}}
+
Normalized Graphics Processor Units (NGUs) include all of the infrastructure (memory, network, rack space, cooling) necessary for GPU-accelerated computation. Each NGU is equivalent to 1 GPU presently, however newer GPUs such as the A100s may require more than 1 NGU to access in the future.
  
Researchers may use GPUs in the form of Normalized Graphics Processor Units (NGUs), which include all of the infrastructure (memory, network, rack space, cooling), necessary for GPU-accelerated computation.  
+
In order to use GPU resources HPG groups need to have an active NGU investment. To check if your group(s) has GPUs allocated and available with the command <code>$ slurmInfo -g group_name</code> (with the [https://help.rc.ufl.edu/doc/UFRC_environment_module module "ufrc"] loaded).
  
Groups that do not have GPU allocations can invest into GPUs by filling out the [https://www.rc.ufl.edu/services/purchase-request/ Purchase Form] or by requesting a [https://www.rc.ufl.edu/services/request-trial-allocation/ Trial Allocation] if they never purchased GPUs and would like to try them out to see the benefit before purchasing.
+
Researchers can add NGUs to their allocations by filling out the [https://www.rc.ufl.edu/get-started/purchase-allocation/ Purchase Form] or requesting a [https://www.rc.ufl.edu/services/request-trial-allocation/ Trial Allocation].
  
=GPU-enabled Services=
+
==Select a GPU Partition==
 +
There are two partitions that contain GPUs: the ''hwgui'' partition for visualization and the ''gpu'' partition for general GPU computation.
  
We have two types of GPU services for two different kinds of applications.  
+
==Open On Demand Access==
 +
{{Note|Interactive OnDemand Jobs in the GPU partition are limited to 12 hrs. Computational GPU jobs are limited to 14 days. Each GPU job requires at least one CPU core|warn}}
  
== Hardware Accelerated GUI ==
+
To access GPUs using [https://help.rc.ufl.edu/doc/Open_OnDemand Open OnDemand], please check the form for your application.  If your application supports multiple GPU types, choose the GPU partition and specify number of GPUs and type:
 +
<div style="column-count:2">
 +
*To request access to one GPU (of any type, use this gres string):
 +
gpu:1
 +
 
 +
*To request multiple GPUs (of any type, use this gres string were n is the number of GPUs you need):
 +
gpu:n
 +
 
 +
*To request a specific type of GPU, use this gres string (requesting geforce GPUs in this example):
 +
gpu:geforce:1
 +
 
 +
*To request a A100 GPU, use this gres string:
 +
gpu:a100:1
 +
</div>
 +
 
 +
 
 +
==GPU-enabled Services==
 +
 
 +
Types of GPU partitions are listed below.
 +
 
 +
=== Hardware Accelerated GUI ===
  
 
GPUs in these servers are used to accelerate rendering for graphical applications. These servers are in the SLURM "'''hwgui'''" partition. Refer to the '''[[Hardware Accelerated GUI Sessions]]''' page for more information on available resources and usage.
 
GPUs in these servers are used to accelerate rendering for graphical applications. These servers are in the SLURM "'''hwgui'''" partition. Refer to the '''[[Hardware Accelerated GUI Sessions]]''' page for more information on available resources and usage.
  
== GPU Assisted Computation ==
+
=== GPU Assisted Computation ===
  
A number of high performance applications installed on HiPerGator implement GPU-accelerated computing functions via CUDA to achieve significant speed-up over CPU implementations. These servers are in the SLURM '''"gpu"''' partition (<code>--partition=gpu</code>).
+
A number of high performance applications installed on HiPerGator implement GPU-accelerated computing functions via CUDA to achieve significant speed-up over CPU calculations. These servers are in the SLURM '''"gpu"''' partition (<code>--partition=gpu</code>).
  
=== Hardware Specifications for the GPU Partition===
+
==== Hardware Specifications for the GPU Partition====
 
We have the following types of NVIDIA GPU nodes available in the "gpu" partition:
 
We have the following types of NVIDIA GPU nodes available in the "gpu" partition:
 
* NVIDIA K80s, with 2 GPUs per card and 2 cards per node. See [https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-product-literature/Tesla-K80-BoardSpec-07317-001-v05.pdf technical specifications] for reference.
 
* NVIDIA GeForce GTX 1080 Ti, with 1 GPU per card and 2 cards per node. See [https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-1080-ti/specifications technical specifications] for reference.
 
* NVIDIA GeForce RTX 2080Ti, with 1 GPU per card and 8 cards per node. See [https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2080-ti technical specifications] for reference.
 
* NVIDIA Quadro RTX 6000, with 8 cards per node. These GPUs have SLI bridging See [https://www.nvidia.com/en-us/design-visualization/quadro/rtx-6000/ technical specifications] for reference.
 
* AI NVIDIA DGX A100 SuperPod, with 8 cards per node. These GPUs have [https://www.nvidia.com/en-us/data-center/nvlink/ NVSWITCH] interconnects See [https://www.nvidia.com/en-us/data-center/a100/ technical specifications] for reference.
 
 
 
{| style="margin-left: 5px; width:80%"
 
{| style="margin-left: 5px; width:80%"
 
|
 
|
 
{| class="wikitable" style="text-align: center"
 
{| class="wikitable" style="text-align: center"
!GPU!!Host Quantity!!Host Architecture!!Host Memory!!Host Interconnect!!CPUs per Host!!CPUS per Socket!!GPUs per Host!!CPUs per GPU!!Memory per GPU!!SLURM Feature!!GRES GPU type
+
!GPU Specs!!Host Quantity!!Host Architecture!!Host Memory!!Host Interconnect!!CPUs per Host!!CPUS per Socket!!GPUs per Host!!CPUs per GPU!!Memory per GPU!!SLURM Feature!!GRES GPU type!!Technical Ref
 
|-
 
|-
| style="width: 14%;"|GeForce 1080Ti||1||Intel Haswell||128 GB||FDR IB||28||14||2||14||11GB||n/a||geforce
+
| style="width: 14%;"|GeForce 1080Ti||1||Intel Haswell||128 GB||FDR IB||28||14||2||14||11GB||n/a||geforce||[https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-1080-ti/specifications Specifications]
 
|-
 
|-
| style="width: 14%;"|GeForce 2080Ti||32||Intel Skylake||187 GB||EDR IB||32||16||8||4||11GB||2080ti||geforce
+
| style="width: 14%;"|GeForce 2080Ti||32||Intel Skylake||187 GB||EDR IB||32||16||8||4||11GB||2080ti||geforce||[https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2080-ti Specifications]
 
|-
 
|-
| style="width: 14%;"|GeForce 2080Ti||38||Intel Cascade Lake||187 GB||EDR IB||32||16||8||4||11GB||2080ti||geforce
+
| style="width: 14%;"|GeForce 2080Ti||38||Intel Cascade Lake||187 GB||EDR IB||32||16||8||4||11GB||2080ti||geforce||[https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2080-ti Specifications]
 
|-
 
|-
| style="width: 14%;"|Quadro RTX 6000||6||Intel Cascade Lake||187 GB||EDR IB||32||16||8||4||23GB||rtx6000||quadro
+
| style="width: 14%;"|Quadro RTX 6000 SLI||6||Intel Cascade Lake||187 GB||EDR IB||32||16||8||4||23GB||rtx6000||quadro||[https://www.nvidia.com/en-us/design-visualization/quadro/rtx-6000/ Specifications]
 
|-
 
|-
| style="width: 14%;"|NVIDIA A100 ||140||AMD EPYC ROME||2 TB||HDR IB||128||16||8||16||80GB||a100||tesla (to change to a100 after August 16th, 2021)
+
| style="width: 14%;"|NVIDIA A100 [https://www.nvidia.com/en-us/data-center/nvlink/ NVSWITCH]||140||AMD EPYC ROME||2 TB||HDR IB||128||16||8||16||80GB||a100||a100||[https://www.nvidia.com/en-us/data-center/a100/ Specifications]
 
|}
 
|}
 
|}
 
|}
  
For a list of node features, and GPU name designations, see the [[Available Node Features]] page.
+
For a list of additional node features, see the [[Available Node Features]] page.
 
 
To select a specific type of GPU within a partition please use either a constraint for SLURM feature or a GRES with the needed GPU type.
 
 
 
= Compiling CUDA Enabled Programs =
 
 
 
To compile CUDA programs, please refer to the [[Nvidia CUDA Toolkit]] page. The current CUDA environment is cuda/10.
 
 
 
= GPU Use Under Slurm =
 
 
 
== Policy ==
 
* GPUs are allocated only via the investment QOS.
 
* To increase the availability of GPU resources, the time limit for the gpu partition is 7-days (at most <code>#SBATCH --time=7-00:00:00</code>).
 
 
 
==Interactive Access ==
 
In order to request interactive access to a GPU under SLURM, use commands similar to those that follow.
 
 
 
*To request access to one GPU (of any type) for a default 10-minute session:
 
srun -p gpu --gpus=1 --pty -u bash -i
 
*To request access to two Tesla GPUs on a single node for a 1-hour session:
 
srun -p gpu --nodes=1 --gpus=tesla:2 --time=01:00:00  --pty -u bash -i
 
*To request access to two GeForce GPUs on a single node for a 1-hour session:
 
srun -p gpu --nodes=1 --gpus=geforce:2 --time=01:00:00  --pty -u bash -i
 
 
 
==Open On Demand Access ==
 
To access GPUs using Open-On-Demand, please check the form for your application.  If your application supports multiple GPU types, use the GPU type to select between GPU types:
 
 
 
*To request access to one GPU (of any type, use this gres string):
 
gpu:1
 
 
 
*To request multiple GPUs (of any type, use this gres string were n is the number of GPUs you need):
 
gpu:n
 
 
 
*To request a specific type of GPU, use this gres string (requesting geforce GPUs in this example):
 
gpu:geforce:1
 
 
 
== Batch Jobs ==
 
For batch jobs, to request GPU resources, use lines similar to the following in your submission script.
 
 
 
*In this example, two Tesla GPUs on a single server (--nodes defaults to "1") will be allocated to the job:
 
<pre>
 
#SBATCH --partition=gpu
 
#SBATCH --gpus=tesla:2
 
</pre>
 
 
 
*In this example, two GeForce GPUs on a single server (--nodes defaults to "1") will be allocated to the job:
 
<pre>
 
#SBATCH --partition=gpu
 
#SBATCH --gpus=geforce:2
 
</pre>
 
 
 
 
 
Alternatively, use '<code>--gres=gpu:1</code>' or '<code>--gres=gpu:geforce:1</code>' format. Note, if '--gpus=' format is used SLURM will not provide the data on GPU usage to slurmInfo and those GPUs will not be shown in slurmInfo output.
 
 
 
If no GPUs are available, your request will be queued and your connection established once the next GPU becomes available. Otherwise, you may cancel your job and try connecting again at a later time. If you have requested a longer time than is needed, please be sure to end your session so that the GPU will be available for other users.
 
  
== Exclusive Mode ==
+
To select a specific type of GPU within a partition please use either a SLURM constraint (e.g. --constraint=rtx6000) or a GRES with the needed GPU type (--gres or --gpu=a100:1).
The GPUs are configured to run in '''exclusive''' mode.  This means that the gpu driver will only allow one process at a time to access the GPU. If GPU 0 is in use and your application tries to use it, it will simply block. If your application does not call cudaSetDevice(), the CUDA runtime should assign it to a free GPU.  Since everyone will be accessing the GPUs through the batch system, there should be no over-subscription of the GPUs.
 
  
== Job Script Examples ==
+
== Compiling CUDA Enabled Programs ==
===MPI Parallel===
 
  
This is a sample script for MPI parallel VASP job requesting and using GPUs under SLURM:
+
The most direct way to develop a custom GPU accelerated algorithm is with the CUDA programming, please refer to the [[Nvidia CUDA Toolkit]] page. The current CUDA environment is cuda/11. However, C++ or Python packages numba and PyCuda are other ways to program GPU algorithms.
  
<pre>
+
== Conda Environments with GPU ==
#!/bin/bash
 
#SBATCH --job-name=vasptest
 
#SBATCH --output=vasp.out
 
#SBATCH --error=vasp.err
 
#SBATCH --mail-type=ALL
 
#SBATCH --mail-user=email@ufl.edu
 
#SBATCH --nodes=1
 
#SBATCH --ntasks=8
 
#SBATCH --cpus-per-task=1
 
#SBATCH --ntasks-per-node=8
 
#SBATCH --ntasks-per-socket=4
 
#SBATCH --mem-per-cpu=7000mb
 
#SBATCH --distribution=cyclic:cyclic
 
#SBATCH --partition=gpu
 
#SBATCH --gres=gpu:geforce:4
 
#SBATCH --time=00:30:00
 
  
echo "Date      = $(date)"
+
To make sure your code will run on GPUs install a recent <code>cudatoolkit</code> package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch or tensorflow package(s). See RC provided tensorflow or pytorch installs for examples if needed. Mamba can detect if there is a gpu in the environment, so the easiest approach is to run the mamba install command in a gpu session. Alternatively, you can run mamba install on any node or if a cpu-only pytorch package was already installed by explicitly requiring a gpu version of pytorch when running mamba install. E.g.
echo "host      = $(hostname -s)"
+
mamba install cudatoolkit=11.3 pytorch=1.12.1=gpu_cuda* -c pytorch
echo "Directory = $(pwd)"
 
  
module purge
+
See also [[Conda]].
module load cuda/10.0.130  intel/2018  openmpi/4.0.0 vasp/5.4.4
 
  
T1=$(date +%s)
+
== Multiple GPUs ==
srun --mpi=pmix_v3 vasp_gpu
+
Find the following resource for [https://github.com/YunchaoYang/MultiGPUTraining2023 Multi-GPU Training].
T2=$(date +%s)
 
  
ELAPSED=$((T2 - T1))
+
== Slurm and GPU Use ==
echo "Elapsed Time = $ELAPSED"
+
View instructions for using GPUs and scheduling GPU jobs with SLURM at [[Slurm and GPU Use]]
</pre>
 

Latest revision as of 18:02, 27 June 2024

GPUs Per Group

Normalized Graphics Processor Units (NGUs) include all of the infrastructure (memory, network, rack space, cooling) necessary for GPU-accelerated computation. Each NGU is equivalent to 1 GPU presently, however newer GPUs such as the A100s may require more than 1 NGU to access in the future.

In order to use GPU resources HPG groups need to have an active NGU investment. To check if your group(s) has GPUs allocated and available with the command $ slurmInfo -g group_name (with the module "ufrc" loaded).

Researchers can add NGUs to their allocations by filling out the Purchase Form or requesting a Trial Allocation.

Select a GPU Partition

There are two partitions that contain GPUs: the hwgui partition for visualization and the gpu partition for general GPU computation.

Open On Demand Access

Interactive OnDemand Jobs in the GPU partition are limited to 12 hrs. Computational GPU jobs are limited to 14 days. Each GPU job requires at least one CPU core

To access GPUs using Open OnDemand, please check the form for your application. If your application supports multiple GPU types, choose the GPU partition and specify number of GPUs and type:

  • To request access to one GPU (of any type, use this gres string):
gpu:1
  • To request multiple GPUs (of any type, use this gres string were n is the number of GPUs you need):
gpu:n
  • To request a specific type of GPU, use this gres string (requesting geforce GPUs in this example):
gpu:geforce:1
  • To request a A100 GPU, use this gres string:
gpu:a100:1


GPU-enabled Services

Types of GPU partitions are listed below.

Hardware Accelerated GUI

GPUs in these servers are used to accelerate rendering for graphical applications. These servers are in the SLURM "hwgui" partition. Refer to the Hardware Accelerated GUI Sessions page for more information on available resources and usage.

GPU Assisted Computation

A number of high performance applications installed on HiPerGator implement GPU-accelerated computing functions via CUDA to achieve significant speed-up over CPU calculations. These servers are in the SLURM "gpu" partition (--partition=gpu).

Hardware Specifications for the GPU Partition

We have the following types of NVIDIA GPU nodes available in the "gpu" partition:

GPU Specs Host Quantity Host Architecture Host Memory Host Interconnect CPUs per Host CPUS per Socket GPUs per Host CPUs per GPU Memory per GPU SLURM Feature GRES GPU type Technical Ref
GeForce 1080Ti 1 Intel Haswell 128 GB FDR IB 28 14 2 14 11GB n/a geforce Specifications
GeForce 2080Ti 32 Intel Skylake 187 GB EDR IB 32 16 8 4 11GB 2080ti geforce Specifications
GeForce 2080Ti 38 Intel Cascade Lake 187 GB EDR IB 32 16 8 4 11GB 2080ti geforce Specifications
Quadro RTX 6000 SLI 6 Intel Cascade Lake 187 GB EDR IB 32 16 8 4 23GB rtx6000 quadro Specifications
NVIDIA A100 NVSWITCH 140 AMD EPYC ROME 2 TB HDR IB 128 16 8 16 80GB a100 a100 Specifications

For a list of additional node features, see the Available Node Features page.

To select a specific type of GPU within a partition please use either a SLURM constraint (e.g. --constraint=rtx6000) or a GRES with the needed GPU type (--gres or --gpu=a100:1).

Compiling CUDA Enabled Programs

The most direct way to develop a custom GPU accelerated algorithm is with the CUDA programming, please refer to the Nvidia CUDA Toolkit page. The current CUDA environment is cuda/11. However, C++ or Python packages numba and PyCuda are other ways to program GPU algorithms.

Conda Environments with GPU

To make sure your code will run on GPUs install a recent cudatoolkit package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch or tensorflow package(s). See RC provided tensorflow or pytorch installs for examples if needed. Mamba can detect if there is a gpu in the environment, so the easiest approach is to run the mamba install command in a gpu session. Alternatively, you can run mamba install on any node or if a cpu-only pytorch package was already installed by explicitly requiring a gpu version of pytorch when running mamba install. E.g.

mamba install cudatoolkit=11.3 pytorch=1.12.1=gpu_cuda* -c pytorch

See also Conda.

Multiple GPUs

Find the following resource for Multi-GPU Training.

Slurm and GPU Use

View instructions for using GPUs and scheduling GPU jobs with SLURM at Slurm and GPU Use