Difference between revisions of "HPG Data Management"

From UFRC
Jump to navigation Jump to search
(Created page with "Back to Getting Started ===Transferring Data=== If you need to transfer datasets to or from HiPerGator and your local computer or another external location you have to pic...")
 
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
Back to [[Getting Started]]
+
Back to [[Getting Started]] __NOTOC__
 +
{{Note|'''Warning: Only run workloads from blue storage.''' This is a fast storage systems that can handle the I/O involved in research workloads. Before using <code>sbatch</code> or launching a workload interactively, make sure your working directory is a blue file path, e.g. <code>/blue/<group>/<user></code>, and not your /orange or /home directory (<code>~</code> or <code>/home/<user></code>). Use <code>pwd</code> to print working directory.|warn}}
 +
 
 +
To perform research analyses you need to [[Transfer_Data|upload]] and [[Storage|manage]] data. Note that misuse of the storage systems is the second main reason for account suspension after running analyses on login nodes.
 
===Transferring Data===
 
===Transferring Data===
 
If you need to transfer datasets to or from HiPerGator and your local computer or another external location you have to pick the appropriate transfer mechanism.
 
If you need to transfer datasets to or from HiPerGator and your local computer or another external location you have to pick the appropriate transfer mechanism.

Latest revision as of 20:27, 5 September 2024

Back to Getting Started

Warning: Only run workloads from blue storage. This is a fast storage systems that can handle the I/O involved in research workloads. Before using sbatch or launching a workload interactively, make sure your working directory is a blue file path, e.g. /blue/<group>/<user>, and not your /orange or /home directory (~ or /home/<user>). Use pwd to print working directory.

To perform research analyses you need to upload and manage data. Note that misuse of the storage systems is the second main reason for account suspension after running analyses on login nodes.

Transferring Data

If you need to transfer datasets to or from HiPerGator and your local computer or another external location you have to pick the appropriate transfer mechanism.

  • For small or medium file transfers use sftp, scp, or rsync to login to sftp.rc.ufl.edu, hpg.rc.ufl.edu, or rsync.rc.ufl.edu.
  • For large file transfers or transfers with many small files, use our Globus service.

For more in-depth information see Transfer Data.

SFTP

SFTP, or secure file transfer, works well for small to medium data transfers and is appropriate for both small and large data files.

If you would like to use a Graphical User Interface secure file transfer client we recommend:

After you have chosen and downloaded a client, configure the client to connect to hpg.rc.ufl.edu, specifying port number 22. Use your username and password to log in.

Samba

Samba service, also known as a 'network share' or 'mapped drive' provides you with an ability to connect to some HiPerGator filesystems as locally mapped drives (or mount points on Linux or MacOS X).Once you connected to a share this mechanism provides you with a file transfer option that allows you to use your client computer's native file manager to access and manage your files. UFRC Samba setup does not provide high performance, so try to use it sparingly and for smaller files, like job scripts or analysis reports to be copied to your local system. You must be connected to the UF network (either on-campus or through the UF VPN) to connect to Samba shares.

Rsync

If you prefer to use the command-line or to get maximum efficiency from your data transfers Rsync, which is an incremental file transfer utility that minimizes network usage, is a good choice. It does so by transmitting only the differences between local and remote files rather than transmitting complete files every time a sync is run as SFTP does. Rsync is best used for tasks like synchronizing files stored across multiple subdirectories, or updating large data sets. It works well both for small and large files. See the Rsync page for instructions on using rsync.

Globus

Globus is a high-performance mechanism for file transfer. Globus works especially well for transferring large files or data sets

Automounted Paths

Note: NFS-based storage on our systems are typically automounted, which means they are dynamically mounted only when users are actually accessing them. For example if you have an invested folder as /orange/smith, to access it you will have to specifically type in the full path of "/orange/smith" to be able to see the contents and access them. Directly browsing /orange will not show the smith sub-folder unless someone else is using it at that moment. Automounted folders are very common on our systems, and include /blue, /orange, /bio, /rlts, and even /home etc.