Difference between revisions of "New user training"

From UFRC
Jump to navigation Jump to search
Line 1: Line 1:
== New User Training ==
 
 
 
This page mirrors and expands upon the content provided in the New User Training module in myTraining. The New User Training module is required for all new account holders within two weeks of obtaining a new account. Users who do not complete the training will have their account deactivated until the training is completed.
 
This page mirrors and expands upon the content provided in the New User Training module in myTraining. The New User Training module is required for all new account holders within two weeks of obtaining a new account. Users who do not complete the training will have their account deactivated until the training is completed.
  
===Training Objectives===
+
==Training Objectives==
 
# Recognize the role of Research Computing, utilize HiPerGator as a research tool and select appropriate resource allocations for analyses
 
# Recognize the role of Research Computing, utilize HiPerGator as a research tool and select appropriate resource allocations for analyses
 
# Log into HiPerGatos using an ssh client
 
# Log into HiPerGatos using an ssh client
Line 13: Line 11:
  
  
===Module 1: Introduction to Research Computing and HiPerGator===
+
==Module 1: Introduction to Research Computing and HiPerGator==
  
====HiPerGator====
+
===HiPerGator===
 
* 46,000 cores
 
* 46,000 cores
 
* Hundreds of GPUs
 
* Hundreds of GPUs
Line 27: Line 25:
 
'''Summary:''' HiPerGator is a large, high-performance compute cluster capable of tackling some of the largest computational challenges, but users need to understand how to responsibly and efficiently use the resources.
 
'''Summary:''' HiPerGator is a large, high-performance compute cluster capable of tackling some of the largest computational challenges, but users need to understand how to responsibly and efficiently use the resources.
  
====Investor Supported====
+
===Investor Supported===
 
HiPerGator is heavily subsidized by the university, but we do require faculty researchers to make investments for access. Research Computing sell three main products:
 
HiPerGator is heavily subsidized by the university, but we do require faculty researchers to make investments for access. Research Computing sell three main products:
 
# Compute: NCUs (Normalized Compute Units)
 
# Compute: NCUs (Normalized Compute Units)
Line 42: Line 40:
 
* Submit a purchase request here: https://www.rc.ufl.edu/services/purchase-request/
 
* Submit a purchase request here: https://www.rc.ufl.edu/services/purchase-request/
  
===Module 2: How to Access and Run Jobs===
+
==Module 2: How to Access and Run Jobs==
  
====Cluster Components====
+
===Cluster Components===
 
# [[Development and Testing|Login servers]]
 
# [[Development and Testing|Login servers]]
 
# [[Sample SLURM Scripts|SLURM Scheduler]]
 
# [[Sample SLURM Scripts|SLURM Scheduler]]
 
# [https://www.rc.ufl.edu/about/cluster-history/ Compute Cluster]
 
# [https://www.rc.ufl.edu/about/cluster-history/ Compute Cluster]
  
====Accessing HiPerGator====
+
===Accessing HiPerGator===
 
* [[Training#Connecting_to_HiPerGator|ssh to host hpg.rc.ufl.edu]]
 
* [[Training#Connecting_to_HiPerGator|ssh to host hpg.rc.ufl.edu]]
 
* [https://jhub.rc.ufl.edu/ jhub.rc.ufl.edu] (requires UF Network)
 
* [https://jhub.rc.ufl.edu/ jhub.rc.ufl.edu] (requires UF Network)
Line 57: Line 55:
 
** See also [[Training#Using_Open_on_Demand_on_HiPerGator|overview video]].
 
** See also [[Training#Using_Open_on_Demand_on_HiPerGator|overview video]].
  
====Proper use of Login Nodes====
+
===Proper use of Login Nodes===
 
Generally speaking, interactive work other than managing jobs and data is ''discouraged'' on the login nodes. Login nodes are intended for file and job management, and short-duration testing and development.
 
Generally speaking, interactive work other than managing jobs and data is ''discouraged'' on the login nodes. Login nodes are intended for file and job management, and short-duration testing and development.
  
Line 67: Line 65:
 
* No more than 64 GB of RAM
 
* No more than 64 GB of RAM
  
====Resources for Scheduling a Job====
+
===Resources for Scheduling a Job===
 
For use beyond what is acceptable on the login servers, you can request resources on development servers, GPUs servers, through JupyterHub, Galaxy, Graphical User Interface servers via open on demand or submit batch jobs. All of these services work with the scheduler to allocate your requested resources so that your computations run efficiently and do not impact other users.
 
For use beyond what is acceptable on the login servers, you can request resources on development servers, GPUs servers, through JupyterHub, Galaxy, Graphical User Interface servers via open on demand or submit batch jobs. All of these services work with the scheduler to allocate your requested resources so that your computations run efficiently and do not impact other users.
  
Line 77: Line 75:
 
* [https://jhub.rc.ufl.edu/ Jupyter Hub]
 
* [https://jhub.rc.ufl.edu/ Jupyter Hub]
  
====Scheduling a Job====
+
===Scheduling a Job===
 
# Understand the resources that your analysis will use:
 
# Understand the resources that your analysis will use:
 
#* '''CPUs''': Can your job use multiple CPU cores? Does it scale?
 
#* '''CPUs''': Can your job use multiple CPU cores? Does it scale?

Revision as of 16:31, 11 August 2020

This page mirrors and expands upon the content provided in the New User Training module in myTraining. The New User Training module is required for all new account holders within two weeks of obtaining a new account. Users who do not complete the training will have their account deactivated until the training is completed.

Training Objectives

  1. Recognize the role of Research Computing, utilize HiPerGator as a research tool and select appropriate resource allocations for analyses
  2. Log into HiPerGatos using an ssh client
  3. Describe appropriate use of the login servers and how to request resources for work beyond those limits
  4. Describe HiPerGator's three main storage systems and the appropriate use for each
  5. Use the module system for loading application environments
  6. Locate where to receive user support
  7. Identify common user mistakes and how to avoid them.


Module 1: Introduction to Research Computing and HiPerGator

HiPerGator

  • 46,000 cores
  • Hundreds of GPUs
  • 10 Petabytes of storage
  • New HiPerGator AI cluster will add
    • 1,120 NVIDIA A100 GPUs
    • 17,000 AMD Rome Epyc Cores

For additional information visit our website: https://www.rc.ufl.edu/

Summary: HiPerGator is a large, high-performance compute cluster capable of tackling some of the largest computational challenges, but users need to understand how to responsibly and efficiently use the resources.

Investor Supported

HiPerGator is heavily subsidized by the university, but we do require faculty researchers to make investments for access. Research Computing sell three main products:

  1. Compute: NCUs (Normalized Compute Units)
    • 1 CPU core and 3.5 GB of RAM
  2. Storage:
    • Blue: High-performance
    • Orange: Intended for archival use
  3. GPUs
    • Sold in units of GPU cards
    • NCU investment also required to make use of GPU

Investments can either be hardware investments, lasting for 5-years or service investments lasting 3-months or longer.

Module 2: How to Access and Run Jobs

Cluster Components

  1. Login servers
  2. SLURM Scheduler
  3. Compute Cluster

Accessing HiPerGator

Proper use of Login Nodes

Generally speaking, interactive work other than managing jobs and data is discouraged on the login nodes. Login nodes are intended for file and job management, and short-duration testing and development.

See more information here.

Acceptable use limits:

  • No more than 16-cores
  • No longer than 10 minutes (wall time)
  • No more than 64 GB of RAM

Resources for Scheduling a Job

For use beyond what is acceptable on the login servers, you can request resources on development servers, GPUs servers, through JupyterHub, Galaxy, Graphical User Interface servers via open on demand or submit batch jobs. All of these services work with the scheduler to allocate your requested resources so that your computations run efficiently and do not impact other users.

Scheduling a Job

  1. Understand the resources that your analysis will use:
    • CPUs: Can your job use multiple CPU cores? Does it scale?
    • Memory: How much RAM will it use? Requesting more will not make your job run faster!
    • GPUs: Does your application use GPUs?
    • Time: How long will it run?
  2. Request those resources:
    • See sample job scripts
    • Watch the HiPerGator: SLURM Submission Scripts training video. This video is approximately 30 minutes and includes a demonstration Play icon.png
    • Watch the HiPerGator: SLURM Submission Scripts for MPI Jobs training video. This video is approximately 26 minutes and includes a demonstration Play icon.png
    • Open on Demand, JupyterHub and Galaxy all have other mechanisms to request resources as SLURM needs this information to schedule your job.
  3. Submit the Job
    • Either using `sbatch` on the command line or through on of the interfaces
    • Once your job is submitted, SLURM will check that there are resources available in your group and schedule the job to run.
  4. Run
    • SLURM will work through the queue and run your job.