Difference between revisions of "FAQ"

From UFRC
Jump to navigation Jump to search
Line 7: Line 7:
 
'''Q:''' How do I get a HiPerGator account?
 
'''Q:''' How do I get a HiPerGator account?
  
* '''A:''' HPG accounts must be requested via the [https://www.rc.ufl.edu/get-started/hipergator/request-hipergator-account/ account request form] and receive a valid sponsor's approval.
+
:'''A:''' HPG accounts must be requested via the [https://www.rc.ufl.edu/get-started/hipergator/request-hipergator-account/ account request form] and receive a valid sponsor's approval.
  
 
'''Q:''' How do I purchase HiPerGator resources or reinvest on expired allocations?
 
'''Q:''' How do I purchase HiPerGator resources or reinvest on expired allocations?
  
'''A:''' If you're a sponsor or account manager, please fill out a purchase form at https://www.rc.ufl.edu/get-started/purchase-allocation/
+
:'''A:''' If you're a sponsor or account manager, please fill out a purchase form at https://www.rc.ufl.edu/get-started/purchase-allocation/
  
 
'''Q:''' How to add users to a group?
 
'''Q:''' How to add users to a group?
  
'''A:''' All users must submit a ticket via the [https://support.rc.ufl.edu/enter_bug.cgi RC Support Ticketing System] with the Subject line in a format similar to '''"Add (username) to (groupname) group"''' in order to gain access to a given group.
+
:'''A:''' All users must submit a ticket via the [https://support.rc.ufl.edu/enter_bug.cgi RC Support Ticketing System] with the Subject line in a format similar to '''"Add (username) to (groupname) group"''' in order to gain access to a given group.
  
 
'''Q:''' I can't login to my HPG account.
 
'''Q:''' I can't login to my HPG account.
  
'''A:''' Visit our [https://help.rc.ufl.edu/doc/Blocked_Accounts Blocked Accounts] wiki page
+
:'''A:''' Visit our [https://help.rc.ufl.edu/doc/Blocked_Accounts Blocked Accounts] wiki page
  
 
'''Q:''' How can I find out what allocations have expired or about to expire?
 
'''Q:''' How can I find out what allocations have expired or about to expire?
  
'''A:''' Please use the showAllocation tool in the 'ufrc' env module. See [[UFRC_environment_module]] for reference on all HPG tools.
+
:'''A:''' Please use the showAllocation tool in the 'ufrc' env module. See [[UFRC_environment_module]] for reference on all HPG tools.
  
 
==Storage==
 
==Storage==
 
'''Q:''' I can't see my (or my group's) /blue or /orange folders!
 
'''Q:''' I can't see my (or my group's) /blue or /orange folders!
  
'''A:''' If you are listing /blue or /orange you won't see your group's directory tree. It's automatically connected (mounted) when you try to access it in any way e.g. by using an 'ls' or 'cd' command. E.g. if your group name is 'mygroup' you should list or cd into /blue/mygroup or /orange/mygroup. See also this short video: https://web.microsoftstream.com/video/87698fe6-84df-40dc-9d22-c3a6c63820fa
+
:'''A:''' If you are listing /blue or /orange you won't see your group's directory tree. It's automatically connected (mounted) when you try to access it in any way e.g. by using an 'ls' or 'cd' command. E.g. if your group name is 'mygroup' you should list or cd into /blue/mygroup or /orange/mygroup. See also this short video: https://web.microsoftstream.com/video/87698fe6-84df-40dc-9d22-c3a6c63820fa
  
 
'''Q:''' Why do I see "No Space Left" in job output or application error?
 
'''Q:''' Why do I see "No Space Left" in job output or application error?
  
'''A:''' If you see a 'No Space Left' or a similar message (no quota remaining, etc) check the path(s) in the error message closely to look for 'home', 'orange', 'blue', or 'red' and check the respective quota for that filesystem. All quota commands are in the [[UFRC_environment_module|'ufrc' environment module]] and include 'home_quota', 'blue_quota', 'orange_quota'. See [[Getting Started]] and [[Storage]] for more help.
+
:'''A:''' If you see a 'No Space Left' or a similar message (no quota remaining, etc) check the path(s) in the error message closely to look for 'home', 'orange', 'blue', or 'red' and check the respective quota for that filesystem. All quota commands are in the [[UFRC_environment_module|'ufrc' environment module]] and include 'home_quota', 'blue_quota', 'orange_quota'. See [[Getting Started]] and [[Storage]] for more help.
  
 
A convenient interactive tool to see what's taking up the storage quota is the '''ncdu''' command in a bash terminal. You can run that command and delete or move data to a different storage to free up space.
 
A convenient interactive tool to see what's taking up the storage quota is the '''ncdu''' command in a bash terminal. You can run that command and delete or move data to a different storage to free up space.
Line 41: Line 41:
 
'''Q:''' Why is HiPerGator running so slow?
 
'''Q:''' Why is HiPerGator running so slow?
  
'''A:''' There are many reasons why users may experience unusually low performance while using HPG. First, users should ensure that performance issues are not originated from their Internet service provider, home network, or personal devices.
+
:'''A:''' There are many reasons why users may experience unusually low performance while using HPG. First, users should ensure that performance issues are not originated from their Internet service provider, home network, or personal devices.
  
  
Line 56: Line 56:
 
'''Q:''' Are there profiling tools installed on HiPerGator that help identify performance bottlenecks?
 
'''Q:''' Are there profiling tools installed on HiPerGator that help identify performance bottlenecks?
  
'''A:''' The [[REMORA]] is the most generic profiling tool we have on the cluster. More specific tools depend on the application/stack or the language. E.g. cProfile for python code, [[Nsight]] Compute for CUDA apps, or VTune for C/C++ + MPI code.
+
:'''A:''' The [[REMORA]] is the most generic profiling tool we have on the cluster. More specific tools depend on the application/stack or the language. E.g. cProfile for python code, [[Nsight]] Compute for CUDA apps, or VTune for C/C++ + MPI code.
  
  
 
'''Q:''' Why is my job still pending?
 
'''Q:''' Why is my job still pending?
  
'''A:''' According to SLURM documentation, when a job cannot be started a reason is immediately found and recorded in the job's "reason" field in the squeue output and the scheduler moves on to the next job to consider.
+
:'''A:''' According to SLURM documentation, when a job cannot be started a reason is immediately found and recorded in the job's "reason" field in the squeue output and the scheduler moves on to the next job to consider.
  
 
Related article: [https://help.rc.ufl.edu/doc/Account_and_QOS_limits_under_SLURM Account and QOS limits under SLURM]
 
Related article: [https://help.rc.ufl.edu/doc/Account_and_QOS_limits_under_SLURM Account and QOS limits under SLURM]

Revision as of 15:20, 13 June 2023

For questions about specific software such as Python, OpenOnDemand, or Custom Installations, visit Applications FAQ

Accounts and Investment

Q: How do I get a HiPerGator account?

A: HPG accounts must be requested via the account request form and receive a valid sponsor's approval.

Q: How do I purchase HiPerGator resources or reinvest on expired allocations?

A: If you're a sponsor or account manager, please fill out a purchase form at https://www.rc.ufl.edu/get-started/purchase-allocation/

Q: How to add users to a group?

A: All users must submit a ticket via the RC Support Ticketing System with the Subject line in a format similar to "Add (username) to (groupname) group" in order to gain access to a given group.

Q: I can't login to my HPG account.

A: Visit our Blocked Accounts wiki page

Q: How can I find out what allocations have expired or about to expire?

A: Please use the showAllocation tool in the 'ufrc' env module. See UFRC_environment_module for reference on all HPG tools.

Storage

Q: I can't see my (or my group's) /blue or /orange folders!

A: If you are listing /blue or /orange you won't see your group's directory tree. It's automatically connected (mounted) when you try to access it in any way e.g. by using an 'ls' or 'cd' command. E.g. if your group name is 'mygroup' you should list or cd into /blue/mygroup or /orange/mygroup. See also this short video: https://web.microsoftstream.com/video/87698fe6-84df-40dc-9d22-c3a6c63820fa

Q: Why do I see "No Space Left" in job output or application error?

A: If you see a 'No Space Left' or a similar message (no quota remaining, etc) check the path(s) in the error message closely to look for 'home', 'orange', 'blue', or 'red' and check the respective quota for that filesystem. All quota commands are in the 'ufrc' environment module and include 'home_quota', 'blue_quota', 'orange_quota'. See Getting Started and Storage for more help.

A convenient interactive tool to see what's taking up the storage quota is the ncdu command in a bash terminal. You can run that command and delete or move data to a different storage to free up space.

If the data that's taking up most of the space is related to application environments and packages such as conda, pip, or singularity, you can modify your configuration file to update the default directories for custom installs. You can find more information about the .condarc setup here: Conda

Performance

Q: Why is HiPerGator running so slow?

A: There are many reasons why users may experience unusually low performance while using HPG. First, users should ensure that performance issues are not originated from their Internet service provider, home network, or personal devices.


Once the possible causes above are discarded, users should report the issue as soon as possible via the RC Support Ticketing System. When reporting the issue, please include detailed information such as:

  • Time when the issue occurred
  • JobID
  • Nodes being used, i.e. username@hpg-node$. Note: Login nodes are not considered high performance nodes and intense jobs should not be executed from them.
  • Paths, file names, etc.
  • Operating system
  • Method for accessing HPG: Jupyterhub, Open OnDemand, or Terminal interface used.


Q: Are there profiling tools installed on HiPerGator that help identify performance bottlenecks?

A: The REMORA is the most generic profiling tool we have on the cluster. More specific tools depend on the application/stack or the language. E.g. cProfile for python code, Nsight Compute for CUDA apps, or VTune for C/C++ + MPI code.


Q: Why is my job still pending?

A: According to SLURM documentation, when a job cannot be started a reason is immediately found and recorded in the job's "reason" field in the squeue output and the scheduler moves on to the next job to consider.

Related article: Account and QOS limits under SLURM

  • Common reasons why jobs are pending
Priority
Resources being reserved for higher priority job. This is particularly common on Burst QOS jobs.
Resources
Required resources are in use
Dependency
Job dependencies not yet satisfied
Reservation
Waiting for advanced reservation
AssociationJobLimit
User or account job limit reached
AssociationResourceLimit
User or account resource limit reached
AssociationTimeLimit
User or account time limit reached
QOSJobLimit
Quality Of Service (QOS) job limit reached
QOSResourceLimit
Quality Of Service (QOS) resource limit reached
QOSTimeLimit
Quality Of Service (QOS) time limit reached