Difference between revisions of "Nvidia CUDA Toolkit"

From UFRC
Jump to navigation Jump to search
Line 151: Line 151:
 
</source>
 
</source>
  
 +
==Sample GPU Job Scripts==
  
==PBS Script Examples==
+
===SLURM Job Scripts===
 +
===PBS Script Examples===
 
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.
 
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.
  
==CUDA Examples==
+
===CUDA Examples===
 
See the [[{{PAGENAME}}_Examples]] page for CUDA development examples.
 
See the [[{{PAGENAME}}_Examples]] page for CUDA development examples.
  

Revision as of 19:19, 31 May 2016

Description

cuda website  
CUDA™ is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU). With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for GPU computing with CUDA.

Required Modules

cuda

System Variables

  • HPC_{{#uppercase:cuda}}_DIR
  • HPC_{{#uppercase:cuda}}_BIN
  • HPC_{{#uppercase:cuda}}_INC
  • HPC_{{#uppercase:cuda}}_LIB

Available GPUs

Research Computing has a significant investment in GPU-enabled servers. Each supports from two to eight Nvidia GPUs (see table below).

Available GPUs under Torque/Moab on HPG1

GPU Quantity Host Quantity Host Architecture Host Memory Host Interconnect Host Attributes Notes
M2070 8 4 Intel E5675 24 GB QDR IB fermi,m2070 cobra[1-4]
M2070 8 1 Intel E5620 24 GB GigE fermi,m2070 vette
M2090 4 1 Intel E5-2643 64 GB FDR IB fermi,m2090
M2090 14 7 AMD Opteron 6220 32 GB QDR IB fermi,m2090

Available GPUs under SLURM on HPG1

GPU Quantity Host Quantity Host Architecture Host Memory Host Interconnect Host Attributes Notes
M2090 26 13 AMD Opteron 6220 32 GB QDR IB fermi,m2090

Available GPUs under SLURM on HPG2

GPU Quantity Host Quantity Host Architecture Host Memory Host Interconnect Host Attributes Notes
Tesla K80 32 8 INTEL E5-2683 132 GB QDR IB fermi,m2090

Usage Policy

Interactive Use

If you need interactive access to a gpu for development and testing you may do so by requesting an interactive session through the batch system.

In order to gain interactive access to a GPU server you should run similar to the one that follows.

Under SLURM

To get the 1 GPU for default 10 minutes session:

srun -p hpg1-gpu --pty -u bash -i

OR

srun -p hpg2-gpu --pty -u bash -i

Under Torque/Moab

qsub -I -l nodes=1:gpus=1:tesla,walltime=01:00:00 -q gpu

To gain access to one of the Fermi-class GPUs, you can make a similar request but specify the "fermi" attribute in your resource request as below.

qsub -I -l nodes=1:gpus=1:fermi,walltime=01:00:00 -q gpu

If a gpu is available, you will get a prompt on a gpu-enabled host within a minute or two. Otherwise, you will have to wait or try another time. If you choose to wait, you will be connected when a gpu is available. The default walltime limit for the gpu queue is 10 minutes. You should request the amount of time you need but be sure to log out and end your session when you are finished so that the GPU will be available to others.

If you need two GPUs in a single host, you would run the following command instead.

qsub -I -l nodes=1:gpus=2,walltime=01:00:00 -q gpu

If you need two gpus in two separate hosts, you would request

qsub -I -l nodes=2:gpus=1,walltime=01:00:00 -q gpu

Batch Jobs

The process is much the same for batch jobs. To access a host with an M2090, you can add the following to your submission script.

#PBS -q gpu
#PBS -l nodes=1:gpus=1:m2090
#PBS -l walltime=1:00:00

To access a server with an M2070 GPU, you can add the following to your submission script.

#PBS -q gpu
#PBS -l nodes=1:gpus=1:m2070
#PBS -l walltime=1:00:00

Exclusive Mode

The GPUs are configured to run in exclusive mode. This means that the gpu driver will only allow one process at a time to access the GPU. If GPU 0 is in use and your application tries to use it, it will simply block. If your application does not call cudaSetDevice(), the CUDA runtime should assign it to a free GPU. Since everyone will be accessing the GPUs through the batch system, there should be no over-subscription of the GPUs.

Environment

For CUDA development please load the "cuda" module. Doing so will ensure that your environment is set up correctly for the use of the CUDA compiler, header files, and libraries.

$ module spider cuda
Rebuilding cache, please wait ... (not written to file) done

    Description:
      NVIDIA CUDA Toolkit

     Versions:
        cuda/4.2
        cuda/5.5

$ module load cuda/5.5

$ which nvcc
/opt/cuda/5.5/bin/nvcc

$ printenv | grep CUDA
HPC_CUDA_LIB=/opt/cuda/5.5/lib64
HPC_CUDA_DIR=/opt/cuda/5.5
HPC_CUDA_BIN=/opt/cuda/5.5/bin
HPC_CUDA_INC=/opt/cuda/5.5/include

Sample GPU Job Scripts

SLURM Job Scripts

PBS Script Examples

See the Nvidia CUDA Toolkit_PBS page for cuda PBS script examples.

CUDA Examples

See the Nvidia CUDA Toolkit_Examples page for CUDA development examples.