Nvidia CUDA Toolkit

Description

cuda website
CUDA™ is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU). With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for GPU computing with CUDA.

Required Modules

cuda

System Variables

HPC_{{#uppercase:cuda}}_DIR
HPC_{{#uppercase:cuda}}_BIN
HPC_{{#uppercase:cuda}}_INC
HPC_{{#uppercase:cuda}}_LIB

Available GPUs

Research Computing has a significant investment in GPU-enabled servers. Each supports from two to eight Nvidia GPUs that range from the earlier S1070s (Tesla T10) to the more recent Kepler K20s (see table below).

GPU	Quantity	Host Quantity	Host Architecture	Host Memory	Host Interconnect	Notes
S1070	8	4	Intel E5462	16 GB	DDR IB
M2070	8	4	Intel E5675	24 GB	QDR IB
M2070	8	1	Intel E5620	24 GB	N/A
M2090	10	5	Intel E5620	64 GB	FDR IB
M2090	16	8	AMD Opteron 6220	32 GB	QDR IB
K20c	2	1	Intel E5675	24 GB	QDR IB	Reserved

Usage Policy

Interactive Use

If you need interactive access to a gpu for development and testing you may do so by requesting an interactive session through the batch system.

In order to gain interactive access to a GPU server you should run similar to the one that follows.

qsub -I -l nodes=1:gpus=1:tesla,walltime=01:00:00 -q gpu

To gain access to one of the Fermi-class GPUs, you can make a similar request but specify the "fermi" attribute in your resource request as below.

qsub -I -l nodes=1:gpus=1:fermi,walltime=01:00:00 -q gpu

If a gpu is available, you will get a prompt on one of the nodes within a minute or two. Otherwise, you will have to wait or try another time. If you choose to wait, you will be connected when a gpu is available. The default walltime limit for the gpu queue is 10 minutes. You should request the amount of time you need but be sure to log out and end your session when you are finished so that the GPU will be available to others.

If your work needs both GPUs attached to the same node, you would run the following command instead.

qsub -I -l nodes=1:gpus=2,walltime=01:00:00 -q gpu

If you need to request a particular machine, say tesla1, you would use the following qsub command.

qsub -I -l nodes=tesla1:gpus=1,walltime=01:00:00 -q gpu

Batch Jobs

The process is much the same for batch jobs. To access a node with an M2090, you can add the following to your submission script.

#PBS -q gpu
#PBS -l nodes=1:gpus=1:M2090
#PBS -l walltime=1:00:00

To access a node with an M2070 GPU, you can add the following to your submission script.

#PBS -q gpu
#PBS -l nodes=1:gpus=1:m2070
#PBS -l walltime=1:00:00

Exclusive Mode

The GPUs are configured to run in exclusive mode. This means that the gpu driver will only allow one process at a time to access the GPU. If GPU 0 is in use and your application tries to use it, it will simply block. If your application does not call cudaSetDevice(), the CUDA runtime should assign it to a free GPU. Since everyone will be accessing the GPUs through the batch system, there should be no over-subscription of the GPUs.

PBS Script Examples

See the Nvidia CUDA Toolkit_PBS page for cuda PBS script examples.