GPU Access

From UFRC
Jump to navigation Jump to search

Researchers may use GPUs in the form of Normalized Graphics Processor Units (NGUs), which include all of the infrastructure (memory, network, rack space, cooling), necessary for GPU-accelerated computation.

Groups that do not have GPU allocations can invest into GPUs by filling out the purchase form at: https://www.rc.ufl.edu/services/purchase-request/.

GPU-enabled Servers

We have two types of GPU services for two different kinds of applications.

Hardware Accelerated GUI

GPUs are used for hardware accelerated graphical applications. To run this type of applications on HiPerGator, please use SLURM partition "hwgui" and refer to Hardware Accelerated GUI Sessions for more information on the usage.

GPU Assisted Computation

A number of high performance applications installed on HiPerGator implement GPU-accelerated computing functions via CUDA to achieve significant speed-up over CPU implementations. Please use SLURM partition "gpu" to run GPU enabled computational applications.

GPU Specification for GPU Partition

We have three types of NVIDIA GPU nodes currently available in "gpu" partition:

  • Nvidia K80s, with 2 GPUs per K80 card and 2 K80 cards in one host. Please refer to K80 technical specs
  • Nvidia GeForce GTX 1080 Ti, with 1 GPU per 1080Ti card and 2 1080Ti cards in one host. Please refer to 1080Ti technical specs
  • Nvidia GeForce RTX 2080Ti, with 1 GPU per 2080Ti card and 8 2080Ti cards in one host. Please refer to 280Ti technical specs
GPU Quantity Host Quantity Host Architecture Host Memory Host Interconnect CPUs per Host GPUs per Host Memory per GPU
Tesla K80 80 20 INTEL E5-2683 128 GB FDR IB 28 4 12GB
GeForce 1080Ti 2 1 INTEL E5-2683 128 GB FDR IB 28 2 11GB

Compile CUDA Enabled Programs

To compile CUDA programs, please refer to Nvidia CUDA Toolkit

GPU Use Policy

Warning:

  • GPUs are allocated only via the investment QOS. There is no burst QOS in the gpu partition. There are few GPUs on HiPerGator because of the high cost of GPU cards, so there is no spare capacity. Purchased GPUs need to be available for users who invested into GPU resources.
  • Time Limit for the gpu partition is 7 days (at most #SBATCH --time=7-00:00:00) to increase the availability of GPU resources.

Interactive Access (SLURM)

In order to request interactive access to a GPU under SLURM, use commands similar to those that follow.

To request access to one GPU (of any type) for a default 10-minute session:
srun -p gpu --gres=gpu:1 --pty -u bash -i
To request access to two Tesla GPUs on a single node for a 1-hour session:
srun -p gpu --gres=gpu:tesla:2 --time=01:00:00  --pty -u bash -i

If no units are accessible, your request will be queued and your connection established once the next GPU becomes available. Otherwise, you may choose to try connecting again at a later time. If you have requested for a longer time than is needed, please be sure to end your session so that the GPU will be available for other users.

Batch Jobs (SLURM)

For batch jobs, to request GPU resources, use lines similar to the following in your submission script.

In this example, two Tesla GPUs on a single server (--nodes defaults to "1") will be allocated to the job:
#SBATCH --partition=gpu
#SBATCH --gres=gpu:tesla:2

Exclusive Mode

The GPUs are configured to run in exclusive mode. This means that the gpu driver will only allow one process at a time to access the GPU. If GPU 0 is in use and your application tries to use it, it will simply block. If your application does not call cudaSetDevice(), the CUDA runtime should assign it to a free GPU. Since everyone will be accessing the GPUs through the batch system, there should be no over-subscription of the GPUs.

Job Script Examples

Hybrid MPI/Threaded

This is a sample script for a hybrid MPI/threaded Gromacs job requesting and using GPUs under SLURM:

#!/bin/bash
#SBATCH --job-name=gromacs_gpu
#SBATCH --output=gromacs_%j.out
#SBATCH --error=gromacs_%j.err
#SBATCH --partition=gpu
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=user@some.domain.com
#SBATCN --nodes=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=7
#SBATCH --ntasks-per-socket=1
#SBATCH --mem-per-cpu=2600mb
#SBATCH --distribution=cyclic:block
#SBATCH --gres=gpu:tesla:2
#SBATCH --time=6:00:00

echo "Date      = $(date)"
echo "host      = $(hostname -s)"
echo "Directory = $(pwd)"

module load intel/2017 openmpi/3.0.0 cuda/9.1.85 gromacs/2018

GROMACS=gmx
export OMP_NUM_THREADS=7

T1=$(date +%s)
srun --mpi=pmix $GROMACS mdrun -v -deffnm topol
T2=$(date +%s)

ELAPSED=$((T2 - T1))
echo "Elapsed Time = $ELAPSED"