Difference between revisions of "Storage"

From UFRC
Jump to navigation Jump to search
(11 intermediate revisions by 3 users not shown)
Line 1: Line 1:
UF Research Computing maintains several shared storage systems that are intended for different user activities. General overview of UFRC procedures for using those filesystems can be found in [https://www.rc.ufl.edu/help/getting-started/storage/ https://www.rc.ufl.edu/help/getting-started/storage//]. Here we discuss practical use of the filesystems on HiPerGator.
+
UF Research Computing maintains several shared storage systems that are intended for different user activities. A summary of the filesystems and their use  can be found in the [https://www.rc.ufl.edu/documentation/policies/storage/ RC storage policy]. Here we discuss practical use of the filesystems on HiPerGator.
  
 
==Home Storage==
 
==Home Storage==
Your home directory is the first directory you see when you log into HiPerGator. It's always found at '~', '/home/$USER' or $HOME paths. The shell variables above can be used in scripts. The HOME directories are the smallest storage devices available to our users. They contain files important for setting up user shell environment and secure shell connections. Do not remove any .bash* files or the .ssh directory or you will have problems using your HiPerGator account. [https://support.rc.ufl.edu Let us know] if some of them were removed by accident, so we could reset the files to standard versions.
+
Your home directory is the first directory you see when you log into HiPerGator. It's always found at <code>~</code>, <code>/home/$USER</code> or <code>$HOME</code> paths. The shell variables above can be used in scripts. The home directories are the smallest storage devices available to our users. They contain files important for setting up user shell environment and secure shell connections. Do not remove any .bash* files or the .ssh directory or you will have problems using your HiPerGator account. [https://support.rc.ufl.edu Let us know] if some of them were removed by accident, so we could reset the files to standard versions.
  
The first rule of using the HOME directory is to not use it for reading or writing data files in any analyses run on HiPerGator. It is permissible to keep software builds, conda environments, text documents, and valuable scripts in $HOME as it is somewhat protected by daily [[Snapshots|snapshots]].
+
The first rule of using the home directory is to not use it for reading or writing data files in any analyses run on HiPerGator. It is permissible to keep software builds, conda environments, text documents, and valuable scripts in $HOME as it is somewhat protected by daily [[Snapshots|snapshots]].
  
 
==Blue Storage==
 
==Blue Storage==
 
Blue Storage is our main high-performance parallel filesystem. This is where all job input/output a.k.a 'job i/o' or reading and writing files must happen. By default your personal directory tree will start at '<code>/blue/GROUP/USER</code>'. That directory cannot be modified by other group members. There is a shared directory at '<code>/blue/GROUP/share</code>' for groups that prefer to share all their data between group members. The parallel nature of the Blue Storage makes it very efficient at reading and writing large files, which can be 'striped' or broken into pieces to be stored on different storage servers. It does not deal well with directories that have a large number of very small files. If a job produces those it is advisable to make use of the [[Temporary Directories]] to alleviate the burden on Blue Storage and make it more responsive and performant for everyone. For groups that purchased separate storage for additional projects the default path to the project directories is '<code>/blue/PROJECT</code>'. That directory is set up similarly to the 'share' directory in the primary group directory tree.
 
Blue Storage is our main high-performance parallel filesystem. This is where all job input/output a.k.a 'job i/o' or reading and writing files must happen. By default your personal directory tree will start at '<code>/blue/GROUP/USER</code>'. That directory cannot be modified by other group members. There is a shared directory at '<code>/blue/GROUP/share</code>' for groups that prefer to share all their data between group members. The parallel nature of the Blue Storage makes it very efficient at reading and writing large files, which can be 'striped' or broken into pieces to be stored on different storage servers. It does not deal well with directories that have a large number of very small files. If a job produces those it is advisable to make use of the [[Temporary Directories]] to alleviate the burden on Blue Storage and make it more responsive and performant for everyone. For groups that purchased separate storage for additional projects the default path to the project directories is '<code>/blue/PROJECT</code>'. That directory is set up similarly to the 'share' directory in the primary group directory tree.
 +
===Storage Automounting===
 +
Do not be alarmed if you do not see your Blue Storage directory when looking in /blue at first. Blue Storage is connected (mounted) on demand, so you have to 'use' it before it becomes visible. For example, changing into your blue directory or listing its contents will connect it and make it visible. so use
 +
cd /blue/mygroup
 +
or
 +
ls /blue/mygroup
 +
to avoid surprises.
 +
 +
If you are using Jupyter Notebook or other GUI or web applications that make it difficult to browse to a specific path you can create a symlink (shortcut) as shown in [[Jupyter_Notebooks#Create_the_Link]]
  
 
==Orange Storage==
 
==Orange Storage==
As described in the UFRC filesystem document above Orange storage is cheaper than Blue, but that means that it cannot support the full brunt of the applications running on HiPerGator. Limit its use to long-term storage of data that's not currently in use or for very gentle access like serial reading of raw data for QC/Filtering with the output of that first step in many workflows going to your Blue Storage directory tree. Do not be alarmed if you do not see your Orange Storage directory with 'ls' at first. Orange Storage is connected (mounted) on demand, so you have to 'use' it before it becomes visible. For example, changing into your orange directory will connect and make it visible. so use  
+
Orange storage is cheaper than Blue, but its hardware is also more limited. Therefore, orange storage cannot support the full brunt of the applications running on HiPerGator. Limit its use to long-term storage of data that's not currently in use or for very gentle access like serial reading of raw data for QC/Filtering with the output of that first step in many workflows going to your Blue Storage directory tree.  
 +
===Storage Automounting===
 +
Do not be alarmed if you do not see your Orange Storage directory with 'ls' at first. Orange Storage is connected (mounted) on demand, so you have to 'use' it before it becomes visible. For example, changing into your orange directory or listing its contents will connect it and make it visible. so use  
 
  cd /orange/mygroup
 
  cd /orange/mygroup
before using
+
or
 
  ls /orange/mygroup
 
  ls /orange/mygroup
 
to avoid surprises.
 
to avoid surprises.
 +
 +
==Red Storage==
 +
Red storage is fully flash based and can support high rates of i/o. The point to remember about Red storage is that the allocations are short-term and the data is removed within 24 hrs of the allocation's end date. See the policy page for how to request an allocation.
 +
 +
==Local Scratch Storage==
 +
All HiPerGator compute nodes have local storage. That storage is flash-based on HPG3 and newer nodes and can support high i/o rates. Older nodes use spinning disks with lower i/o rates compared to flash. Using local scratch storage on HiPerGator compute nodes [[Temporary Directories]] is a way to insulate an analysis from most of the other jobs running on HiPerGator, which are generally using Blue storage. Therefore it may be possible to use TMPDIR to get much higher i/o rates as the job competes for local scratch i/o with a limited number of jobs running on the same compute node that also chose to use local scratch. The caveat is that using local scratch requires staging out input data (copying it from /blue to $TMPDIR within a job) and staging in the results (copying results files back to /blue) since the job's temporary scratch directory is automatically removed at the end of the job, so the files on it are irretrievably lost.
 +
 +
==Checking Quotas and Managing Storage==
 +
The [[UFRC_environment_module|ufrc environment module]] has several tools useful for checking storage use and quotas as well as exploring directories and their space use.
 +
* <code>home_quota</code> - show your HiPerGator Home directory quota usage.
 +
* <code>blue_quota</code> - show HiPerGator Blue Storage (<code>/blue</code>) quota usage for your user and group.
 +
* <code>orange_quota</code> - show HiPerGator Orange Storage (<code>/orange</code>) quota usage for your project(s).
 +
* <code>ncdu</code> - an interactive program for showing directory sizes, browsing a directory tree, and removing files and directories in a terminal (ssh session).

Revision as of 11:55, 1 June 2022

UF Research Computing maintains several shared storage systems that are intended for different user activities. A summary of the filesystems and their use can be found in the RC storage policy. Here we discuss practical use of the filesystems on HiPerGator.

Home Storage

Your home directory is the first directory you see when you log into HiPerGator. It's always found at ~, /home/$USER or $HOME paths. The shell variables above can be used in scripts. The home directories are the smallest storage devices available to our users. They contain files important for setting up user shell environment and secure shell connections. Do not remove any .bash* files or the .ssh directory or you will have problems using your HiPerGator account. Let us know if some of them were removed by accident, so we could reset the files to standard versions.

The first rule of using the home directory is to not use it for reading or writing data files in any analyses run on HiPerGator. It is permissible to keep software builds, conda environments, text documents, and valuable scripts in $HOME as it is somewhat protected by daily snapshots.

Blue Storage

Blue Storage is our main high-performance parallel filesystem. This is where all job input/output a.k.a 'job i/o' or reading and writing files must happen. By default your personal directory tree will start at '/blue/GROUP/USER'. That directory cannot be modified by other group members. There is a shared directory at '/blue/GROUP/share' for groups that prefer to share all their data between group members. The parallel nature of the Blue Storage makes it very efficient at reading and writing large files, which can be 'striped' or broken into pieces to be stored on different storage servers. It does not deal well with directories that have a large number of very small files. If a job produces those it is advisable to make use of the Temporary Directories to alleviate the burden on Blue Storage and make it more responsive and performant for everyone. For groups that purchased separate storage for additional projects the default path to the project directories is '/blue/PROJECT'. That directory is set up similarly to the 'share' directory in the primary group directory tree.

Storage Automounting

Do not be alarmed if you do not see your Blue Storage directory when looking in /blue at first. Blue Storage is connected (mounted) on demand, so you have to 'use' it before it becomes visible. For example, changing into your blue directory or listing its contents will connect it and make it visible. so use

cd /blue/mygroup

or

ls /blue/mygroup

to avoid surprises.

If you are using Jupyter Notebook or other GUI or web applications that make it difficult to browse to a specific path you can create a symlink (shortcut) as shown in Jupyter_Notebooks#Create_the_Link

Orange Storage

Orange storage is cheaper than Blue, but its hardware is also more limited. Therefore, orange storage cannot support the full brunt of the applications running on HiPerGator. Limit its use to long-term storage of data that's not currently in use or for very gentle access like serial reading of raw data for QC/Filtering with the output of that first step in many workflows going to your Blue Storage directory tree.

Storage Automounting

Do not be alarmed if you do not see your Orange Storage directory with 'ls' at first. Orange Storage is connected (mounted) on demand, so you have to 'use' it before it becomes visible. For example, changing into your orange directory or listing its contents will connect it and make it visible. so use

cd /orange/mygroup

or

ls /orange/mygroup

to avoid surprises.

Red Storage

Red storage is fully flash based and can support high rates of i/o. The point to remember about Red storage is that the allocations are short-term and the data is removed within 24 hrs of the allocation's end date. See the policy page for how to request an allocation.

Local Scratch Storage

All HiPerGator compute nodes have local storage. That storage is flash-based on HPG3 and newer nodes and can support high i/o rates. Older nodes use spinning disks with lower i/o rates compared to flash. Using local scratch storage on HiPerGator compute nodes Temporary Directories is a way to insulate an analysis from most of the other jobs running on HiPerGator, which are generally using Blue storage. Therefore it may be possible to use TMPDIR to get much higher i/o rates as the job competes for local scratch i/o with a limited number of jobs running on the same compute node that also chose to use local scratch. The caveat is that using local scratch requires staging out input data (copying it from /blue to $TMPDIR within a job) and staging in the results (copying results files back to /blue) since the job's temporary scratch directory is automatically removed at the end of the job, so the files on it are irretrievably lost.

Checking Quotas and Managing Storage

The ufrc environment module has several tools useful for checking storage use and quotas as well as exploring directories and their space use.

  • home_quota - show your HiPerGator Home directory quota usage.
  • blue_quota - show HiPerGator Blue Storage (/blue) quota usage for your user and group.
  • orange_quota - show HiPerGator Orange Storage (/orange) quota usage for your project(s).
  • ncdu - an interactive program for showing directory sizes, browsing a directory tree, and removing files and directories in a terminal (ssh session).