Difference between revisions of "Getting Started"

From UFRC
Jump to navigation Jump to search
(165 intermediate revisions by 15 users not shown)
Line 1: Line 1:
 +
[[Category:Basics]]
 
{|align=right
 
{|align=right
 
   |__TOC__
 
   |__TOC__
 
   |}
 
   |}
==Getting an account==
+
Welcome to UF Research Computing! This page is intended to help new clients get started on HiPerGator.
To get an account at the UF HPC Center, you need to read the UF HPC Center [http://www.hpc.ufl.edu/index.php?body=policies Policies] and then put a request in at our request page. That page is [http://www.hpc.ufl.edu/index.php?body=authuser located here].
 
  
==Logging In==
+
=From Zero to HiPerGator=
To login to the cluster, you need an SSH client of some sort. If you are using a linux or unix based system, there is most likely one already available to you in a shell, and you can get to your account very quickly.
+
==Initial Consult==
 +
If a face-to-face discussion about the group's needs is needed you can [https://www.rc.ufl.edu/get-support/walk-in-support/ meet one of the UF Research Computing Facilitators] face-to-face or virtually or [https://support.rc.ufl.edu/ submit a support request to start the conversation].  
  
If you are having problems connecting to your account, please let the HPC Staff know by submitting a [http://bugzilla.hpc.ufl.edu Bugzilla Request].
+
==HiPerGator Accounts==
===Linux / Unix===
+
Group's sponsor has to be the first person to [https://www.rc.ufl.edu/access/account-request/ request a HiPerGator account] indicating that they are a new sponsor. In the process we will create their sponsored group.
The login host at the HPC Center is <code>submit.hpc.ufl.edu</code>.  Here is how you would go about logging in via a linux or unix account:
+
 
{|
+
Afterwards, group members will be able to [https://www.rc.ufl.edu/access/account-request/ submit HiPerGator account requests] indicating their PI as the sponsor. Once approved, their linux accounts will be created.
|
+
 
<pre>
+
==Trial Allocation==
ssh -Y <YOUR_USERNAME>@submit.hpc.ufl.edu
+
We recommend that the group's sponsor [https://gravity.rc.ufl.edu/access/request-trial-allocation/ requests a '''free''' trial allocation] for storage and computational resources to get the group started on HiPerGator. Group members can then use HiPerGator for the 3 month duration of the trial allocation to figure out what resources and applications they really need.
</pre>
+
 
|}
+
==Purchasing Resources==
where <code><YOUR_USERNAME></code> is your HPC Center username, which was sent to you when you got your HPC Center account.
+
After or while the group uses a trial allocation to determine the computational and storage resources it needs the group's sponsor can submit a purchase request for [https://gravity.rc.ufl.edu/access/purchase-request/hpg-hardware/ hardware (5-years)] or [https://gravity.rc.ufl.edu/access/purchase-request/hpg-service/ services (3-months to longer)] to invest into the resources to cover the group's HiPerGator use.
  
The command <code>ssh -Y <YOUR_USERNAME>@submit.hpc.ufl.edu</code> is what you would type in at a command prompt on your system. After this, it asks you for a password, which you type in. After that, you are logged in and ready to work.  As a concrete example, if your HPC Center username is "smith", you would use the command <code>ssh smith@submit.hpc.ufl.edu</code> to log into the HPC Center.
+
Some groups may have access to shared departmental allocations. In this case, instead of purchasing resources, group members can [https://support.rc.ufl.edu/ request] to be added to the departmental group to gain access to the shared resources.  
  
The <code>-Y</code> flag is used to indicate that X11 should be tunneled to your local machine.
+
Some examples of departments with shared allocations include the [http://ufgi.ufl.edu/ Genetics Institute], [http://epi.ufl.edu/ Emerging Pathogens Institute], [https://stat.ufl.edu/ Statistics Department], [http://biostat.ufl.edu/ Biostatistics Department], [https://www.eng.ufl.edu/ccmt/ Center for Compressible Multiphase Turbulence (CCMT)], [https://chp.phhp.ufl.edu/research/affiliated-centers/center-for-cognitive-aging-memory-cam/ Cognitive Aging and Memory Clinical Translational Research Program (CAMCTRP)], [https://efrc.ufl.edu/ Center for Molecular Magnetic Quantum Materials], [https://www.phys.ufl.edu/ Physics Department], and [https://plantpath.ifas.ufl.edu/ Plant Pathology Department]. In addition, several research groups working on collaborative projects have shared allocations accessible to members of those projects.
  
===Windows===
+
'''At this point a group is established on HiPerGator and can continue their computational work. See below for more details on the basic use.'''
For Microsoft Windows, things are a bit trickier. Windows does not come with a built-in SSH client, and this makes things difficult. What you have to do is download a client from the Internet and install it, then use that client. We recommend the following:
 
* For a shell client, [http://www.chiark.greenend.org.uk/~sgtatham/putty Putty]
 
* For a file transfer client, [http://winscp.net/eng/index.php WinSCP]
 
Both of the above clients have documentation at their websites. Once you are logged in and have a prompt that resembles that in the Unix / Linux section above, you can continue with this tutorial.
 
  
===Macintosh===
+
=Introduction to Using HiPerGator=
For Macintosh users that are running OS X (those who are not running OS X are in a whole different category of user) the instructions are very similar to those of Linux.
 
  
One thing that may be tough for a Macintosh user is finding a way in which to get to a command prompt in the first place so that they can use the linux-style commands. In order to get a command prompt, you need to start a terminal session. The terminal program on a Macintosh is located in Applications/Accessories. When a user starts this program, they start a terminal session whose prompt is located at that user's home directory.
+
;Note: see a short [[Quick Start Guide]] for some hints on getting going and avoiding common pitfalls.
  
From there, you should be able to follow the [[Getting_Started#Linux_.2F_Unix | linux]] based instructions listed above.
+
To use HiPerGator or HiPerGator-AI you need three basic parts
 +
* Interfaces
 +
You use Interfaces to interact with the system, manage data, initialize computation, and view the results. The main categories of interfaces 'Command-Line' also known as Terminal, Graphical User Interfaces, and Web Interfaces or applications for more specialized use. Some distinctions here are blurred because, for example, you can open a Terminal while using a Web Interface like [[JupyterHub]] or [[Open OnDemand]], but mostly you use a command-line Terminal interface through SSH connections (see below).
 +
* Data Management
 +
To perform research analyses you need to [[Transfer_Data|upload]] and [[Storage|manage]] data. Note that misuse of the storage systems is the second main reason for account suspension after running analyses on login nodes.
 +
*Computation
 +
'''Warning:''' do not run full-scale (normal) analyses on login nodes. [[Development and Testing]] is required reading. The main approach to run computational analyses is through writing [[Sample SLURM Scripts|job scripts]] and sending them to the [[SLURM_Commands|scheduler]] to run. Some interfaces like [[Open OnDemand]], [[JupyterHub], and [[Galaxy]] can manage job scheduling behind the scenes and may be more convenient than job submission from the command-line when appropriate.
  
If there is anyone out there that knows of a GUI application for Mac that makes the use of SCP easier, please let me know and I will include it here.
+
==Interfaces==
 +
===Connecting to a HiPerGator Terminal via SSH===
 +
To work on HiPerGator you will have to connect to it from your local computer either via SSH (terminal session) or via one of the web/application interfaces we provide such as [[Galaxy]], [[Open_OnDemand|Open OnDemand]], or [[JupyterHub]].
  
==Changing your Password==
+
For any given command below, <code><username></code> should be replaced with the UFRC username (same as your GatorLink username).
The passwords assigned to users at the HPC Center expire every six months. If you receive a message stating that it is about to expire, you can reset it. In order to change your password to something perhaps a little easier to remember than the password supplied, which typically looks like a line of static, you can do the following:
 
<pre>
 
$ passwd
 
Changing password for user <YOUR USERNAME>.
 
Enter login(LDAP) password:
 
New UNIX password:
 
Retype new UNIX password:
 
LDAP password information changed for <YOUR USERNAME>
 
passwd: all authentication tokens updated successfully.
 
</pre>
 
The above asks for your old password, then asks for your new password a total of two times. This makes sure that you are entering your new password correctly.
 
  
Your HPC password should be '''no longer than 16 characters'''. If your proposed password is too long, you will get a response like the following:
+
====Connecting from Windows====
 +
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 +
''Expand this section to view instructions for logging in with Windows.''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 +
Since Microsoft Windows does not come with a built-in SSH client, you must download a client from the web.
  
<pre>
+
For University-managed computers [http://www.chiark.greenend.org.uk/~sgtatham/putty PuTTY], [https://tabby.sh/ Tabby], and [https://gitforwindows.org/ Git Bash] are approved for 'fast track' installations.
$ passwd
 
Changing password for user <YOUR USERNAME>.
 
Enter login(LDAP) password:  
 
New UNIX password:  
 
Retype new UNIX password:
 
LDAP password information update failed: Can't contact LDAP server
 
  
passwd: Authentication service cannot retrieve authentication info
+
'''PuTTY'''
</pre>
+
* [http://www.chiark.greenend.org.uk/~sgtatham/putty Download PuTTY] to your local machine and start the program.
 +
* Connect to hpg.rc.ufl.edu.
 +
* At the login prompt, enter your username (this should be the same as your GatorLink username)
 +
* Enter your password when prompted. You are now connected and ready to work!
  
'''If your HPC password expires, you can [http://www.hpc.ufl.edu/index.php?body=resetpw reset your password] on the HPC website using your Gatorlink username and password.
+
'''Tabby'''
'''
+
* [https://github.com/Eugeny/tabby/releases/latest Download Tabby] to your local machine: tabby-version#-setup.exe or tabby-version#-portable.zip for a portable version.
 +
* Star the program and click Settings > Profiles > +New profile > SSH connection
 +
  Name: HiPerGator
 +
  Host: hpg.rc.ufl.edu
 +
  Username: <username>
 +
  Password: "Set password" or "Add a private key"
 +
* Click "Save"
 +
* Click on the window icon "New tab with profile" and select "HiPerGator hpg.rc.ufl.edu"
 +
* You are now connected and ready to work!
 +
</div>
 +
</div>
  
==Looking Around==
+
====Connecting from Linux and MacOS====
We expect the users of the HPC Center cluster to already have a working knowledge of the linux operating system, so we are not going to go into detail here on using the operating system. Below are some links to webpages that describe a lot of this, however:
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
* [http://www.tutorialized.com/tutorial/Basic-Linux-Shell-commands/21596 Basic Linux Shell Commands]
+
''Expand to view instructions for connecting from Linux or MacOS.''
===Basic Commands===
+
<div class="mw-collapsible-content" style="padding: 5px;">
{| border=1
+
Open a terminal and run
|-
+
ssh <username>@hpg.rc.ufl.edu
! Command !! Description
 
|-
 
| ls || List files in the current directory
 
|-
 
| cd || Change directory
 
|-
 
| more || View a file
 
|}
 
  
===File System===
+
Enter your password when the prompt appears. You are now connected and ready to work!
We have a structured file system that is important to know about. Please read about it here: [[HPC File System]]
+
</div>
 +
</div>
 +
==Data Management==
  
==Editing==
+
===Transferring Data===
Editing files on the cluster can be done through a couple of different methods...
+
If you need to transfer datasets to or from HiPerGator and your local computer or another external location you have to pick the appropriate transfer mechanism.
===In-System Editors===
 
* '''vi''' - vi is the basic editor for a number of people. Using the editor is not necessarily intuitive, so we have provided a link to a tutorial.
 
** [http://www.eng.hawaii.edu/Tutor/vi.html VI Tutorial]
 
* '''emacs''' - emacs is a much heavier duty editor, but again has the problem of having commands that are non-intuitive. Again, we have provided a link to a tutorial for this editor.
 
** [http://www2.lib.uchicago.edu/~keith//tcl-course/emacs-tutorial.html Emacs Tutorial]
 
* '''pico''' - While pico is not installed on the system, [[nano]] is installed, and is a clone of pico.
 
* '''nano''' - nano has a good amount of on-screen help to make it easier to use.
 
  
===External Editors===
+
===SFTP====
You can also use your favorite editor on your local machine and then transfer the files over to the HPC Cluster afterwards. One caveat to this is that with files created on Windows machines, very often extra characters get injected into text files which can cause major problems when it comes time to interpret the files. If this happens, there is a utility called <code>dos2unix</code> that you can use to remove the extra characters.
+
SFTP, or secure file transfer, works well for small to medium data transfers and is appropriate for both small and large data files.
  
==Running Jobs==
+
If you would like to use a Graphical User Interface secure file transfer client we recommend:
===General Scheduling===
+
* <s>[https://filezilla-project.org/download.php?show_all=1 FileZilla] for Windows or MacOS X.</s>
Jobs from faculty investors in the HPC Center are now favored over jobs from groups who did not invest in the HPC Center.
+
: The FileZilla installer contains adware/malware and probably should be avoided.
 +
* [http://winscp.net/eng/index.php WinSCP] for Windows.
 +
* [http://cyberduck.io/ Cyberduck] for MacOS X and Windows.
  
Job scheduling has been a big topic with the HPC committee in the last several months. The HPC Center staff has been directed by the committee to improve the quality of service of job scheduling for jobs coming from investors in the HPC Center. This means reducing the time spent in the queues and allowing jobs from the investors to capture the full share of the resources that they have paid for.  The HPC committee recently adopted a document which spells out what they want.
+
After you have chosen and downloaded a client, configure the client to connect to <code>hpg.rc.ufl.edu</code>, specifying port number 22. Use your username and password to log in.
  
Jobs can be submitted on '''submit.hpc.ufl.edu'''.
+
====Rsync====
 +
If you prefer to use the command-line or to get maximum efficiency from your data transfers Rsync, which is an incremental file transfer utility that minimizes network usage, is a good choice. It does so by transmitting only the differences between local and remote files rather than transmitting complete files every time a sync is run as SFTP does. Rsync is best used for tasks like synchronizing files stored across multiple subdirectories, or updating large data sets. It works well both for small and large files. [[Rsync|See the Rsync page]] for instructions on using rsync.
  
===Torque Scheduler===
+
===Globus===
The [http://www.clusterresources.com/pages/products/torque-resource-manager.php Torque Resource Manager] has been installed on the HPC cluster, and is slowly being switched over to from the PBS Pro scheduler as time goes on. The [[Maui]] scheduler is also running in this environment.
+
Globus is a high-performance mechanism for file transfer. Globus works especially well for transferring large files or data sets
 +
* [[Globus|See the Globus page]] for setup and configuration information.
  
The Torque scheduler is installed on the cluster, and accepts the same commands as the PBS Pro scheduler with a couple of exceptions. Currently we are experimenting with these packages so that we can provide improved scheduling to HPC users. While we are still learning about Torque and [[Maui]], our experiences so far have been good and we are guardedly optimistic that Torque and Maui will end up being the resource manager and scheduler for the HPC Center sometime in the not-too-distant future.
+
===Samba===
 +
Samba service, also known as a '<code>network share</code>' or '<code>mapped drive</code>' provides you with an ability to connect to some HiPerGator filesystems as locally mapped drives (or mount points on Linux or MacOS X).Once you connected to a share this mechanism provides you with a file transfer option that allows you to use your client computer's native file manager to access and manage your files. UFRC Samba setup does not provide high performance, so try to use it sparingly and for smaller files, like job scripts or analysis reports to be copied to  your local system. You must be connected to the UF network (either on-campus or through the [[VPN]]) to connect to Samba shares.
  
Please note the following.
+
* [[Samba_Access|See the page on accessing Samba]] for setup information specific to your computer's operating system.
  
* If your job is single-threaded (1 cpu) and does not have heavy I/O requirements, it does not need to run on an infiniband-enabled node.  In that case, you should include the "gige" property in your PBS resource specification as follows
+
===Automounted Paths===
<pre>
+
Note: NFS-based storage on our systems are typically automounted, which means they are dynamically mounted only when users are actually accessing them. For example if you have an invested folder as /orange/smith, to access it you will have to specifically type in the full path of "/orange/smith" to be able to see the contents and access them. Directly browsing /orange will not show the smith sub-folder unless someone else is using it coincidentally. Automounted folders are pretty common on the systems, they include /orange, /bio, /rlts and even /home etc.
#PBS  -l nodes=1:ppn=1
 
</pre>
 
* If you need to run an mpi-based application that has not been rebuilt for OpenMPI 1.2.0+Torque, please send us a note and we'll be happy to rebuild  what you need - first come, first serve.
 
* If you build your own MPI-based application executables, you should use the MPI compiler wrappers (mpif90, mpicc, mpiCC) in /opt/intel/ompi/1.2.0/bin.  These wrappers will automatically pull in the correct libraries.
 
* We will continue to tune the maui scheduler to provide fair and efficient scheduling according to the policies established by the HPC Committee and within the capabilities of the maui scheduler.    Keep in mind that these policies include priority and quality-of-service commitments to those faculty who have invested in the resources within the HPC Center.
 
  
 +
==Editing your files==
 +
Several methods exist for editing your files on the cluster.
 +
===Native Editors===
 +
* '''vi''' - The visual editor (vi) is the traditional Unix editor; however, it is not necessarily the most intuitive editor. [http://www.eng.hawaii.edu/Tutor/vi.html View a tutorial for using vi]
 +
* '''emacs''' - Emacs is a much heavier duty editor, but again has the problem of having commands that are non-intuitive. [http://www2.lib.uchicago.edu/~keith//tcl-course/emacs-tutorial.html View a tutorial for using emacs]
 +
* '''pico''' - While pico is not installed on the system, nano is installed, and is a pico work-a-like.
 +
* '''nano''' - Nano has a good bit of on-screen help to make it easier to use.
  
 +
===External Editors===
 +
You can also use your favorite file editor on your local machine, and then transfer the files to the cluster afterward. A caveat to this is that files created on Windows machines usually contain unprintable characters, which may be misinterpreted by Linux command interpreters (shells). If this happens, there is a utility called <code>dos2unix</code> that you can use to convert the text file from DOS/Windows formatting to Linux formatting.
  
====Trivial Example====
+
==Computation==
{|
+
===Using installed software===
|-valign="top"
+
The full list of software available for use can be viewed on the [[Installed_Software|Installed Software]] page. Access to installed software is provided through [[Modules|Environment Modules]].
|
 
<pre>
 
#! /bin/sh
 
#PBS -N testjob
 
#PBS -o testjob.out
 
#PBS -e testjob.err
 
#PBS -M <INSERT EMAIL HERE>
 
#PBS -r n
 
#PBS -l walltime=00:01:00
 
#PBS -l nodes=1:ppn=1
 
#PBS -l pmem=100mb
 
  
date
+
The following command can be used to browse the full list of available modules, along with short descriptions of the applications they make available:
hostname
 
</pre>
 
||
 
To submit this job from [[submit.hpc.ufl.edu]], you would use the following command:
 
 
<pre>
 
<pre>
$ qsub <your job script>
+
module spider
 
</pre>
 
</pre>
To check the status of running jobs, you would use the following command:
+
 
 +
To load a module, use the following command:
 
<pre>
 
<pre>
$ qstat [-u <username>]
+
module load <module_name>
 
</pre>
 
</pre>
or [http://www.hpc.ufl.edu/ HPC] --> Utilization --> [http://www.hpc.ufl.edu/index2.php?body=queue2 Torque Queue Status]
 
|}
 
  
 +
For more information on loading modules to access software, view the page on the [[Modules_Basic_Usage|basic usage of environment modules]].
 +
 +
There are some useful commands and utilities in a [[UFRC_environment_module|'ufrc' environment module]] in addition to installed applications.
  
'''More information on PBS scripts can be seen with our [[Sample Scripts]].'''
+
===Interactive Testing or Development===
 +
You don't always have to use the SLURM scheduler. When all you need is a quick shell session to run a command or two, write and/or test a job script, or compile some code use [[Development_and_Testing|SLURM Dev Sessions]].
  
====Notes on Batch Scripts====
+
===Running Graphical Programs===
* The script can handle only one set of directives. '''Do not submit a script that has more than one set of directives''' included in it, as this will cause the [[Moab]]/Torque system to reject the script with a ''qsub: Job rejected by all possible destinations'' error. This problem was first seen when a user complained about a script that was being rejected with this error. Upon further inspection of their script, it was found that the script had concatenated versions of itself in the same file.
+
It is possible to run programs that use a graphical user interface (GUI) on the system. However, doing so requires an installation of and configuration of additional software on the client computer.  
  
* For more info on '''advanced directives''' see [[PBS_Directives]]
+
Please see the [[GUI_Programs|GUI Programs]] page for information on running graphical user interface applications at UFRC.
  
* For a more detailed explanation of what is going on in a [[Batch Script Explanation | batch script]]
+
===Scheduling computational jobs===
====Troubleshooting Batch Scripts====
+
UFRC uses the Simple Linux Utility for Resource Management, or '''SLURM''', to allocate resources and schedule jobs. Users can create SLURM job scripts to submit jobs to the system. These scripts can, and should, be modified in order to control several aspects of your job, like resource allocation, email notifications, or an output destination.
  
==Compiling your own==
+
* See the [[Annotated_SLURM_Script|Annotated SLURM Script]] for a walk-through of the basic components of a SLURM job script
By default, when you first login to the system you have compilers setup for the Intel OpenMPI compilers. This gives you access to C, C++, F77 and F90 compilers in your path, as well as the mpicc and mpif90 compilers for OpenMPI applications. If you want to change this, you can use a selector program on the cluster called "mpi-selector". The command specification for this is as follows:
+
* See the [[Sample_SLURM_Scripts|Sample SLURM Scripts]] for several SLURM job script examples
 +
 
 +
 
 +
To submit a job script from one of the login nodes accessed via hpg.rc.ufl.edu, use the following command:
 
<pre>
 
<pre>
/usr/bin/mpi-selector options:
+
$ sbatch <your_job_script>
 +
</pre>
 +
To check the status of submitted jobs, use the following command:
 +
<pre>
 +
$ squeue -u <username>
 +
</pre>
 +
 
 +
View [[SLURM_Commands]] for more useful SLURM commands.
 +
 
 +
====Managing Cores and Memory====
 +
See [[Account and QOS limits under SLURM]] for the main documentation on efficient management of computational resources.
 +
 
 +
The amount of resources within an investment is calculated in NCU (Normalized Computing Units), which correspond to 1 CPU core and about 3.5GB of memory for each NCU purchased. CPUs (cores) and RAM are allocated to jobs independently as requested by your job script.
  
Options for MPI implementations:
+
Your group's investment can run out of **cores** (SLURM may show <code>QOSGrpCpuLimit</code> in the reason a job is pending) OR **memory** (SLURM may show <code>QOSGrpMemLimit</code> in the reason a job is pending) depending on current use by running jobs.
 +
The majority of HiPerGator nodes have the same ratio of about 4 GB of RAM per core, which, after accounting for the operating system and system services, leaves about 3.5 GB usable for jobs; hence the ratio of 1 core and 3.5GB RAM per NCU.
  
--register <name>    Register an MPI implementation with the central
+
Most HiPerGator nodes have 32 cores and 128 GB RAM (~30,000 cores in the newer part of the cluster) or 64 cores and 256 GB RAM (~16,000 cores in the older part of the cluster). The [[Large-Memory SMP Servers|bigmem]] nodes and the newer Skylake nodes have a higher ratio of 16 GB/core and 6 GB/core, respectively. See [[Available_Node_Features]] for the exact data on resources available on all types of nodes on HiPerGator.
                      mpi-selector database.  Requires use of the
 
                      --source-dir option.
 
--source-dir <dir>    Used with --register, indicating that <dir> is
 
                      where mpivars.sh and mpivars.csh can be found.
 
--unregister <name>  Remove an MPI implementation list from the
 
                      central mpi-selector database.
 
  
Options for system adminsitrators:
+
You must specify both the number of cores and the amount of RAM needed in the job script for SLURM with the <code>--mem</code> (total job memory) or <code>--mem-per-cpu</code> (per-core memory) options. Otherwise, the job will be assigned the default 600mb of memory.
 +
 +
If you need more than 128 GB of RAM, you can only run on the older nodes, which have 256 GB of RAM, or on the bigmem nodes, which have up to 1.5 TB of RAM.
  
--system              When used with the --set and --unset options,
+
See [[Account and QOS limits under SLURM]] for an extensive explanation of QOS and SLURM account use.
                      act on the system-wide defaults (vs. the
 
                      per-user defaults).  When used with --query, only
 
                      show the site-wide default (if there is one).
 
--user                When used with the --set and --unset options,
 
                      act on the per-user defaults (vs. the
 
                      site-wide defaults).  When used with --query, only
 
                      show the per-user default (if there is one).
 
  
Options for users:
+
==Getting help==
 +
If you are having problems using the UFRC system, please let our staff know by submitting a [http://support.rc.ufl.edu support request].
  
--list                Show a list of the currently registered MPI
+
==Visual Overview==
                      implementations.
+
The diagram below shown a high-level overview of HiPerGator use. We will go over each part in sections below
--set <name>          Set <name> to be the default MPI selection.
+
 
--unset              Remove the default MPI selection.
+
[[file:HiPerGator.png|800px]]
--query              Shows the current default MPI selection.
 
--yes                Assume the answer is "yes" to any question.
 
--no                  Assume the answer is "no" to any question.
 
--verbose            Be verbose about actions.
 
--silent              Print nothing (not even warnings or errors;
 
                      overrides --verbose)
 
--version            Display the version of /usr/bin/mpi-selector.
 
</pre>
 

Revision as of 22:16, 14 January 2022

Welcome to UF Research Computing! This page is intended to help new clients get started on HiPerGator.

From Zero to HiPerGator

Initial Consult

If a face-to-face discussion about the group's needs is needed you can meet one of the UF Research Computing Facilitators face-to-face or virtually or submit a support request to start the conversation.

HiPerGator Accounts

Group's sponsor has to be the first person to request a HiPerGator account indicating that they are a new sponsor. In the process we will create their sponsored group.

Afterwards, group members will be able to submit HiPerGator account requests indicating their PI as the sponsor. Once approved, their linux accounts will be created.

Trial Allocation

We recommend that the group's sponsor requests a free trial allocation for storage and computational resources to get the group started on HiPerGator. Group members can then use HiPerGator for the 3 month duration of the trial allocation to figure out what resources and applications they really need.

Purchasing Resources

After or while the group uses a trial allocation to determine the computational and storage resources it needs the group's sponsor can submit a purchase request for hardware (5-years) or services (3-months to longer) to invest into the resources to cover the group's HiPerGator use.

Some groups may have access to shared departmental allocations. In this case, instead of purchasing resources, group members can request to be added to the departmental group to gain access to the shared resources.

Some examples of departments with shared allocations include the Genetics Institute, Emerging Pathogens Institute, Statistics Department, Biostatistics Department, Center for Compressible Multiphase Turbulence (CCMT), Cognitive Aging and Memory Clinical Translational Research Program (CAMCTRP), Center for Molecular Magnetic Quantum Materials, Physics Department, and Plant Pathology Department. In addition, several research groups working on collaborative projects have shared allocations accessible to members of those projects.

At this point a group is established on HiPerGator and can continue their computational work. See below for more details on the basic use.

Introduction to Using HiPerGator

Note
see a short Quick Start Guide for some hints on getting going and avoiding common pitfalls.

To use HiPerGator or HiPerGator-AI you need three basic parts

  • Interfaces

You use Interfaces to interact with the system, manage data, initialize computation, and view the results. The main categories of interfaces 'Command-Line' also known as Terminal, Graphical User Interfaces, and Web Interfaces or applications for more specialized use. Some distinctions here are blurred because, for example, you can open a Terminal while using a Web Interface like JupyterHub or Open OnDemand, but mostly you use a command-line Terminal interface through SSH connections (see below).

  • Data Management

To perform research analyses you need to upload and manage data. Note that misuse of the storage systems is the second main reason for account suspension after running analyses on login nodes.

  • Computation

Warning: do not run full-scale (normal) analyses on login nodes. Development and Testing is required reading. The main approach to run computational analyses is through writing job scripts and sending them to the scheduler to run. Some interfaces like Open OnDemand, [[JupyterHub], and Galaxy can manage job scheduling behind the scenes and may be more convenient than job submission from the command-line when appropriate.

Interfaces

Connecting to a HiPerGator Terminal via SSH

To work on HiPerGator you will have to connect to it from your local computer either via SSH (terminal session) or via one of the web/application interfaces we provide such as Galaxy, Open OnDemand, or JupyterHub.

For any given command below, <username> should be replaced with the UFRC username (same as your GatorLink username).

Connecting from Windows

Expand this section to view instructions for logging in with Windows.

Since Microsoft Windows does not come with a built-in SSH client, you must download a client from the web.

For University-managed computers PuTTY, Tabby, and Git Bash are approved for 'fast track' installations.

PuTTY

  • Download PuTTY to your local machine and start the program.
  • Connect to hpg.rc.ufl.edu.
  • At the login prompt, enter your username (this should be the same as your GatorLink username)
  • Enter your password when prompted. You are now connected and ready to work!

Tabby

  • Download Tabby to your local machine: tabby-version#-setup.exe or tabby-version#-portable.zip for a portable version.
  • Star the program and click Settings > Profiles > +New profile > SSH connection
 Name: HiPerGator
 Host: hpg.rc.ufl.edu
 Username: <username>
 Password: "Set password" or "Add a private key"
  • Click "Save"
  • Click on the window icon "New tab with profile" and select "HiPerGator hpg.rc.ufl.edu"
  • You are now connected and ready to work!

Connecting from Linux and MacOS

Expand to view instructions for connecting from Linux or MacOS.

Open a terminal and run

ssh <username>@hpg.rc.ufl.edu

Enter your password when the prompt appears. You are now connected and ready to work!

Data Management

Transferring Data

If you need to transfer datasets to or from HiPerGator and your local computer or another external location you have to pick the appropriate transfer mechanism.

SFTP=

SFTP, or secure file transfer, works well for small to medium data transfers and is appropriate for both small and large data files.

If you would like to use a Graphical User Interface secure file transfer client we recommend:

The FileZilla installer contains adware/malware and probably should be avoided.

After you have chosen and downloaded a client, configure the client to connect to hpg.rc.ufl.edu, specifying port number 22. Use your username and password to log in.

Rsync

If you prefer to use the command-line or to get maximum efficiency from your data transfers Rsync, which is an incremental file transfer utility that minimizes network usage, is a good choice. It does so by transmitting only the differences between local and remote files rather than transmitting complete files every time a sync is run as SFTP does. Rsync is best used for tasks like synchronizing files stored across multiple subdirectories, or updating large data sets. It works well both for small and large files. See the Rsync page for instructions on using rsync.

Globus

Globus is a high-performance mechanism for file transfer. Globus works especially well for transferring large files or data sets

Samba

Samba service, also known as a 'network share' or 'mapped drive' provides you with an ability to connect to some HiPerGator filesystems as locally mapped drives (or mount points on Linux or MacOS X).Once you connected to a share this mechanism provides you with a file transfer option that allows you to use your client computer's native file manager to access and manage your files. UFRC Samba setup does not provide high performance, so try to use it sparingly and for smaller files, like job scripts or analysis reports to be copied to your local system. You must be connected to the UF network (either on-campus or through the VPN) to connect to Samba shares.

Automounted Paths

Note: NFS-based storage on our systems are typically automounted, which means they are dynamically mounted only when users are actually accessing them. For example if you have an invested folder as /orange/smith, to access it you will have to specifically type in the full path of "/orange/smith" to be able to see the contents and access them. Directly browsing /orange will not show the smith sub-folder unless someone else is using it coincidentally. Automounted folders are pretty common on the systems, they include /orange, /bio, /rlts and even /home etc.

Editing your files

Several methods exist for editing your files on the cluster.

Native Editors

  • vi - The visual editor (vi) is the traditional Unix editor; however, it is not necessarily the most intuitive editor. View a tutorial for using vi
  • emacs - Emacs is a much heavier duty editor, but again has the problem of having commands that are non-intuitive. View a tutorial for using emacs
  • pico - While pico is not installed on the system, nano is installed, and is a pico work-a-like.
  • nano - Nano has a good bit of on-screen help to make it easier to use.

External Editors

You can also use your favorite file editor on your local machine, and then transfer the files to the cluster afterward. A caveat to this is that files created on Windows machines usually contain unprintable characters, which may be misinterpreted by Linux command interpreters (shells). If this happens, there is a utility called dos2unix that you can use to convert the text file from DOS/Windows formatting to Linux formatting.

Computation

Using installed software

The full list of software available for use can be viewed on the Installed Software page. Access to installed software is provided through Environment Modules.

The following command can be used to browse the full list of available modules, along with short descriptions of the applications they make available:

module spider

To load a module, use the following command:

module load <module_name>

For more information on loading modules to access software, view the page on the basic usage of environment modules.

There are some useful commands and utilities in a 'ufrc' environment module in addition to installed applications.

Interactive Testing or Development

You don't always have to use the SLURM scheduler. When all you need is a quick shell session to run a command or two, write and/or test a job script, or compile some code use SLURM Dev Sessions.

Running Graphical Programs

It is possible to run programs that use a graphical user interface (GUI) on the system. However, doing so requires an installation of and configuration of additional software on the client computer.

Please see the GUI Programs page for information on running graphical user interface applications at UFRC.

Scheduling computational jobs

UFRC uses the Simple Linux Utility for Resource Management, or SLURM, to allocate resources and schedule jobs. Users can create SLURM job scripts to submit jobs to the system. These scripts can, and should, be modified in order to control several aspects of your job, like resource allocation, email notifications, or an output destination.


To submit a job script from one of the login nodes accessed via hpg.rc.ufl.edu, use the following command:

$ sbatch <your_job_script>

To check the status of submitted jobs, use the following command:

$ squeue -u <username>

View SLURM_Commands for more useful SLURM commands.

Managing Cores and Memory

See Account and QOS limits under SLURM for the main documentation on efficient management of computational resources.

The amount of resources within an investment is calculated in NCU (Normalized Computing Units), which correspond to 1 CPU core and about 3.5GB of memory for each NCU purchased. CPUs (cores) and RAM are allocated to jobs independently as requested by your job script.

Your group's investment can run out of **cores** (SLURM may show QOSGrpCpuLimit in the reason a job is pending) OR **memory** (SLURM may show QOSGrpMemLimit in the reason a job is pending) depending on current use by running jobs. The majority of HiPerGator nodes have the same ratio of about 4 GB of RAM per core, which, after accounting for the operating system and system services, leaves about 3.5 GB usable for jobs; hence the ratio of 1 core and 3.5GB RAM per NCU.

Most HiPerGator nodes have 32 cores and 128 GB RAM (~30,000 cores in the newer part of the cluster) or 64 cores and 256 GB RAM (~16,000 cores in the older part of the cluster). The bigmem nodes and the newer Skylake nodes have a higher ratio of 16 GB/core and 6 GB/core, respectively. See Available_Node_Features for the exact data on resources available on all types of nodes on HiPerGator.

You must specify both the number of cores and the amount of RAM needed in the job script for SLURM with the --mem (total job memory) or --mem-per-cpu (per-core memory) options. Otherwise, the job will be assigned the default 600mb of memory.

If you need more than 128 GB of RAM, you can only run on the older nodes, which have 256 GB of RAM, or on the bigmem nodes, which have up to 1.5 TB of RAM.

See Account and QOS limits under SLURM for an extensive explanation of QOS and SLURM account use.

Getting help

If you are having problems using the UFRC system, please let our staff know by submitting a support request.

Visual Overview

The diagram below shown a high-level overview of HiPerGator use. We will go over each part in sections below

HiPerGator.png