Difference between revisions of "GPU Access"
(8 intermediate revisions by 4 users not shown) | |||
Line 3: | Line 3: | ||
|__TOC__ | |__TOC__ | ||
|} | |} | ||
− | {{Note|Interactive Jobs in the GPU partition are limited to 12 hrs|warn}} | + | {{Note|Interactive OnDemand Jobs in the GPU partition are limited to 12 hrs. Computational GPU jobs are limited to 14 days. Each GPU job requires at least one CPU core|warn}} |
Normalized Graphics Processor Units (NGUs) include all of the infrastructure (memory, network, rack space, cooling) necessary for GPU-accelerated computation. Each NGU is equivalent to 1 GPU presently, however newer GPUs such as the A100s may require more than 1 NGU to access in the future. | Normalized Graphics Processor Units (NGUs) include all of the infrastructure (memory, network, rack space, cooling) necessary for GPU-accelerated computation. Each NGU is equivalent to 1 GPU presently, however newer GPUs such as the A100s may require more than 1 NGU to access in the future. | ||
Researchers can add NGUs to their allocations by filling out the [https://www.rc.ufl.edu/get-started/purchase-allocation/ Purchase Form] or requesting a [https://www.rc.ufl.edu/services/request-trial-allocation/ Trial Allocation]. | Researchers can add NGUs to their allocations by filling out the [https://www.rc.ufl.edu/get-started/purchase-allocation/ Purchase Form] or requesting a [https://www.rc.ufl.edu/services/request-trial-allocation/ Trial Allocation]. | ||
+ | |||
+ | ==Open On Demand Access== | ||
+ | To access GPUs using [https://help.rc.ufl.edu/doc/Open_OnDemand Open OnDemand], please check the form for your application. If your application supports multiple GPU types, choose the GPU partition and specify number of GPUs and type: | ||
+ | <div style="column-count:2"> | ||
+ | *To request access to one GPU (of any type, use this gres string): | ||
+ | gpu:1 | ||
+ | |||
+ | *To request multiple GPUs (of any type, use this gres string were n is the number of GPUs you need): | ||
+ | gpu:n | ||
+ | |||
+ | *To request a specific type of GPU, use this gres string (requesting geforce GPUs in this example): | ||
+ | gpu:geforce:1 | ||
+ | |||
+ | *To request a A100 GPU, use this gres string: | ||
+ | gpu:a100:1 | ||
+ | </div> | ||
+ | |||
==GPU-enabled Services== | ==GPU-enabled Services== | ||
Line 16: | Line 33: | ||
GPUs in these servers are used to accelerate rendering for graphical applications. These servers are in the SLURM "'''hwgui'''" partition. Refer to the '''[[Hardware Accelerated GUI Sessions]]''' page for more information on available resources and usage. | GPUs in these servers are used to accelerate rendering for graphical applications. These servers are in the SLURM "'''hwgui'''" partition. Refer to the '''[[Hardware Accelerated GUI Sessions]]''' page for more information on available resources and usage. | ||
− | |||
=== GPU Assisted Computation === | === GPU Assisted Computation === | ||
Line 43: | Line 59: | ||
For a list of additional node features, see the [[Available Node Features]] page. | For a list of additional node features, see the [[Available Node Features]] page. | ||
− | To select a specific type of GPU within a partition please use either a SLURM constraint (e.g. --constraint=rtx6000) or a GRES with the needed GPU type (--gres or --gpu=a100:1) | + | To select a specific type of GPU within a partition please use either a SLURM constraint (e.g. --constraint=rtx6000) or a GRES with the needed GPU type (--gres or --gpu=a100:1). |
== Compiling CUDA Enabled Programs == | == Compiling CUDA Enabled Programs == | ||
The most direct way to develop a custom GPU accelerated algorithm is with the CUDA programming, please refer to the [[Nvidia CUDA Toolkit]] page. The current CUDA environment is cuda/11. However, C++ or Python packages numba and PyCuda are other ways to program GPU algorithms. | The most direct way to develop a custom GPU accelerated algorithm is with the CUDA programming, please refer to the [[Nvidia CUDA Toolkit]] page. The current CUDA environment is cuda/11. However, C++ or Python packages numba and PyCuda are other ways to program GPU algorithms. | ||
+ | |||
+ | == Conda Environments with GPU == | ||
+ | |||
+ | To make sure your code will run on GPUs install a recent <code>cudatoolkit</code> package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch or tensorflow package(s). See RC provided tensorflow or pytorch installs for examples if needed. Mamba can detect if there is a gpu in the environment, so the easiest approach is to run the mamba install command in a gpu session. Alternatively, you can run mamba install on any node or if a cpu-only pytorch package was already installed by explicitly requiring a gpu version of pytorch when running mamba install. E.g. | ||
+ | mamba install cudatoolkit=11.3 pytorch=1.12.1=gpu_cuda* -c pytorch | ||
+ | |||
+ | See also [[Conda]]. | ||
+ | |||
+ | == Multiple GPUs == | ||
+ | Find the following resource for [https://github.com/YunchaoYang/MultiGPUTraining2023 Multi-GPU Training]. | ||
== Slurm and GPU Use == | == Slurm and GPU Use == | ||
View instructions for using GPUs and scheduling GPU jobs with SLURM at [[Slurm and GPU Use]] | View instructions for using GPUs and scheduling GPU jobs with SLURM at [[Slurm and GPU Use]] |
Revision as of 20:15, 16 April 2024
Normalized Graphics Processor Units (NGUs) include all of the infrastructure (memory, network, rack space, cooling) necessary for GPU-accelerated computation. Each NGU is equivalent to 1 GPU presently, however newer GPUs such as the A100s may require more than 1 NGU to access in the future.
Researchers can add NGUs to their allocations by filling out the Purchase Form or requesting a Trial Allocation.
Open On Demand Access
To access GPUs using Open OnDemand, please check the form for your application. If your application supports multiple GPU types, choose the GPU partition and specify number of GPUs and type:
- To request access to one GPU (of any type, use this gres string):
gpu:1
- To request multiple GPUs (of any type, use this gres string were n is the number of GPUs you need):
gpu:n
- To request a specific type of GPU, use this gres string (requesting geforce GPUs in this example):
gpu:geforce:1
- To request a A100 GPU, use this gres string:
gpu:a100:1
GPU-enabled Services
Types of GPUs are listed below. Two partitions contain GPUs - the hwgui partition for visualization and the gpu partition for general computation.
Hardware Accelerated GUI
GPUs in these servers are used to accelerate rendering for graphical applications. These servers are in the SLURM "hwgui" partition. Refer to the Hardware Accelerated GUI Sessions page for more information on available resources and usage.
GPU Assisted Computation
A number of high performance applications installed on HiPerGator implement GPU-accelerated computing functions via CUDA to achieve significant speed-up over CPU calculations. These servers are in the SLURM "gpu" partition (--partition=gpu
).
Hardware Specifications for the GPU Partition
We have the following types of NVIDIA GPU nodes available in the "gpu" partition:
|
For a list of additional node features, see the Available Node Features page.
To select a specific type of GPU within a partition please use either a SLURM constraint (e.g. --constraint=rtx6000) or a GRES with the needed GPU type (--gres or --gpu=a100:1).
Compiling CUDA Enabled Programs
The most direct way to develop a custom GPU accelerated algorithm is with the CUDA programming, please refer to the Nvidia CUDA Toolkit page. The current CUDA environment is cuda/11. However, C++ or Python packages numba and PyCuda are other ways to program GPU algorithms.
Conda Environments with GPU
To make sure your code will run on GPUs install a recent cudatoolkit
package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch or tensorflow package(s). See RC provided tensorflow or pytorch installs for examples if needed. Mamba can detect if there is a gpu in the environment, so the easiest approach is to run the mamba install command in a gpu session. Alternatively, you can run mamba install on any node or if a cpu-only pytorch package was already installed by explicitly requiring a gpu version of pytorch when running mamba install. E.g.
mamba install cudatoolkit=11.3 pytorch=1.12.1=gpu_cuda* -c pytorch
See also Conda.
Multiple GPUs
Find the following resource for Multi-GPU Training.
Slurm and GPU Use
View instructions for using GPUs and scheduling GPU jobs with SLURM at Slurm and GPU Use