Researchers may use GPUs in the form of Normalized Graphics Processor Units (NGUs), which include all of the infrastructure (memory, network, rack space, cooling), necessary for GPU-accelerated computation.

Groups that do not have GPU allocations can invest into GPUs by filling out the purchase form at: https://www.rc.ufl.edu/services/purchase-request/.

GPU-enabled Services

We have two types of GPU services for two different kinds of applications.

Hardware Accelerated GUI

GPUs in these servers are used to accelerate rendering for graphical applications. These servers are in the SLURM "hwgui" partition. Refer to the Hardware Accelerated GUI Sessions page for more information on available resources and usage.

GPU Assisted Computation

A number of high performance applications installed on HiPerGator implement GPU-accelerated computing functions via CUDA to achieve significant speed-up over CPU implementations. These servers are in the SLURM "gpu" partition (--partition=gpu).

Hardware Specifications for the GPU Partition

We have three types of NVIDIA GPU nodes currently available in the "gpu" partition:

Nvidia K80s, with 2 GPUs per K80 card and 2 K80 cards in one host. Please refer to K80 technical specs
Nvidia GeForce GTX 1080 Ti, with 1 GPU per 1080Ti card and 2 1080Ti cards in one host. Please refer to 1080Ti technical specs
Nvidia GeForce RTX 2080Ti, with 1 GPU per 2080Ti card and 8 2080Ti cards in one host. Please refer to 2080Ti technical specs

GPU	Host Quantity	Host Architecture	Host Memory	Host Interconnect	CPUs per Host	CPUS per Socket	GPUs per Host	CPUs per GPU	Memory per GPU
Tesla K80	20	Intel Haswell	128 GB	FDR IB	28	14	4	7	12GB
GeForce 1080Ti	1	Intel Haswell	128 GB	FDR IB	28	14	2	14	11GB
GeForce 2080Ti	13	Intel Skylake	192 GB	EDR IB	32	16	8	4	11GB
Quadro RTX 6000	24	187 GB	EDR IB	32	16	8	4	23GB

Compiling CUDA Enabled Programs

To compile CUDA programs, please refer to the Nvidia CUDA Toolkit page. The current CUDA environment is cuda/10.

GPU Use Under Slurm

Policy

GPUs are allocated only via the investment QOS.
To increase the availability of GPU resources, the time limit for the gpu partition is 7-days (at most #SBATCH --time=7-00:00:00).

Interactive Access

In order to request interactive access to a GPU under SLURM, use commands similar to those that follow.

• To request access to one GPU (of any type) for a default 10-minute session:

srun -p gpu --gpus=1 --pty -u bash -i

• To request access to two Tesla GPUs on a single node for a 1-hour session:

srun -p gpu --gpus=tesla:2 --time=01:00:00  --pty -u bash -i

• To request access to two GeForce GPUs on a single node for a 1-hour session:

srun -p gpu --gpus=geforce:2 --time=01:00:00  --pty -u bash -i

• To request access to GPU nodes in cuda/9 environment for a 1-hour session:

srun -p gpu --gpus=1 --constraint=cuda9 -t 01:00:00 --pty -u bash -i

Batch Jobs

For batch jobs, to request GPU resources, use lines similar to the following in your submission script.

• In this example, two Tesla GPUs on a single server (--nodes defaults to "1") will be allocated to the job:

#SBATCH --partition=gpu
#SBATCH --gpus=tesla:2

• In this example, two GeForce GPUs on a single server (--nodes defaults to "1") will be allocated to the job:

#SBATCH --partition=gpu
#SBATCH --gpus=geforce:2

• In this example, 2 GPUs on a single server (--nodes defaults to "1") with cuda/9 environment will be allocated to the job:

#SBATCH --partition=gpu
#SBATCH --constraint=cuda9

Alternatively, use '--gres=gpu:1' or '--gres=gpu:geforce:1' format. Note, if '--gpus=' format is used SLURM will not provide the data on GPU usage to slurmInfo and those GPUs will not be counted.

If no GPUs are available, your request will be queued and your connection established once the next GPU becomes available. Otherwise, you may cancel your job and try connecting again at a later time. If you have requested a longer time than is needed, please be sure to end your session so that the GPU will be available for other users.

Exclusive Mode

The GPUs are configured to run in exclusive mode. This means that the gpu driver will only allow one process at a time to access the GPU. If GPU 0 is in use and your application tries to use it, it will simply block. If your application does not call cudaSetDevice(), the CUDA runtime should assign it to a free GPU. Since everyone will be accessing the GPUs through the batch system, there should be no over-subscription of the GPUs.

Job Script Examples

MPI Parallel

This is a sample script for MPI parallel VASP job requesting and using GPUs under SLURM:

#!/bin/bash
#SBATCH --job-name=vasptest
#SBATCH --output=vasp.out
#SBATCH --error=vasp.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@ufl.edu
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=8
#SBATCH --ntasks-per-socket=4
#SBATCH --mem-per-cpu=7000mb
#SBATCH --distribution=cyclic:cyclic
#SBATCH --partition=gpu
#SBATCH --gres=gpu:geforce:4
#SBATCH --time=00:30:00

echo "Date      = $(date)"
echo "host      = $(hostname -s)"
echo "Directory = $(pwd)"

module purge
module load cuda/10.0.130  intel/2018  openmpi/4.0.0 vasp/5.4.4

T1=$(date +%s)
srun --mpi=pmix_v3 vasp_gpu
T2=$(date +%s)

ELAPSED=$((T2 - T1))
echo "Elapsed Time = $ELAPSED"

GPU Access

Contents