Difference between revisions of "FAQ"

Revision as of 16:04, 13 February 2023

Storage

Q: I can't see my (or my group's) /blue or /orange folders!

A: If you are listing /blue or /orange you won't see your group's directory tree. It's automatically connected (mounted) when you try to access it in any way e.g. by using an 'ls' or 'cd' command. E.g. if your group name is 'mygroup' you should list or cd into /blue/mygroup or /orange/mygroup. See also this short video: https://web.microsoftstream.com/video/87698fe6-84df-40dc-9d22-c3a6c63820fa

Q: Why do I see "No Space Left" in my output?

A: If you see a 'No Space Left' or a similar message (no quota remaining, etc) check the path(s) in the error message closely to look for 'home', 'orange', 'blue', or 'red' and check the respective quota for that filesystem. All quota commands are in the 'ufrc' environment module and include 'home_quota', 'blue_quota', 'orange_quota'. See Getting Started and Storage for more help.

A convenient interactive tool to see what's taking up the storage quota is 'ncdu' in the 'ufrc' env. module.

If the data that's taking up most of the space is related to application environments and packages such as conda, pip, or singularity, you can modify your configuration file to update the default directories for custom installs. You can find more information about the .condarc setup here: Conda

In case you consider purchasing more storage, please visit the Purchase Allocation portal.

Applications

Python

Q: Installed a python package via 'pip install PACKAGEX', but 'import PACKAGEX' results in an error.

A: A pip install you performed puts the resulting package into your personal directory tree located in the ~/.local/lib/pythonX.Y/site-packages directory tree. A personal pip install can often result in an installation of a python package from a binary archive (wheel) that was built on a system against software libraries that are not compatible with HiPerGator. A typical error message in such case complains about the lack of a particular GLIBC version or some other missing library. Note that the issue can be exacerbated by an incompatible interaction between an environment loaded via an environment module ('module load something') and a personal python package install. To avoid this issue the python package must be installed into an isolated environment. Our approach for creating such environments depends on many factors, but usually results in a Conda or containerized environment.

Custom Installation

Q: I want to have a custom install of an application or python modules.

A: We recommend creating a Conda environment and installing needed packages with the 'mamba' tool from the conda environment module. It is possible to mix conda and pip installed packages inside a conda environment as conda/mamba is aware of packages installed via pip, but not vice versa.

R

Q: How do I install R packages?

A: Users can install R packages in their local directory. The default directory is /home/my.username/R/x86_64-pc-linux-gnu-library/X.X/ (X.X = version number)

From a standard repository (such as CRAN-R)

$ module load R/X.X
$ R
> install.packages("PACKAGE")

From github

$ module load R/X.X
$ R
> devtools::install_github("author/software")
or
> remotes::install_github("author/software")

From a tarball

$ module load R/X.X
$ R CMD INSTALL /path/package.tar.gz

Q: When I submit a job using 'parallel' package all threads seem to share a single CPU core instead of running on the separate cores I requested.

A: On SLURM you need to use --cpus-per-task to specify the number of available cores. E.g.

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12

will allow mcapply or other function from the 'parallel' package to run on all requested cores

Jupyter

Q: Why do I see the following error message? (kernel).ipynb appears to have died. It will restart automatically.

A: This is typically caused by the kernel using more RAM than what was requested when starting the session. Increase your memory request.

Q: Why am I not able to spawn a Jupyter session?

A: One common cause for being unable to login to a Jupyter (JupyterHub or Jupyter Notebook) session is running out of home space quota. See the FAQ item above for "No Space Left".

Another reason is packages conflicting while loading the session. In this case, it is necessary to look for errors in the output and check for packages from the user's local environment listed.

Performance

Q: Why is HiPerGator running so slow?

A: There are many reasons why users may experience unusually low performance while using HPG. First, users should ensure that performance issues are not originated from their Internet service provider, home network, or personal devices.

Once the possible causes above are discarded, users should report the issue as soon as possible via the RC Support Ticketing System. When reporting the issue, please include detailed information such as:

Time when the issue occurred
JobID
Nodes being used, i.e. username@hpg-node$. Note: Login nodes are not considered high performance nodes and intense jobs should not be executed from them.
Paths, file names, etc.
Operating system
Method for accessing HPG: Jupyterhub, Open OnDemand, or Terminal interface used.

In case you consider purchasing more resources, please visit the Purchase Allocation portal.

Q: Are there profiling tools installed on HiPerGator that help identify performance bottlenecks?

A: The REMORA is the most generic profiling tool we have on the cluster. More specific tools depend on the application/stack or the language. E.g. cProfile for python code, Nsight Compute for CUDA apps, or VTune for C/C++ + MPI code.

Q: Why is my job still pending?

A: According to SLURM documentation, when a job cannot be started a reason is immediately found and recorded in the job's "reason" field in the squeue output and the scheduler moves on to the next job to consider.

Related article: Account and QOS limits under SLURM

Common reasons why jobs are pending

Priority: Resources being reserved for higher priority job. This is particularly common on Burst QOS jobs.

Refer to the Choosing QOS for a Job page for details.

Resources: Required resources are in use
Dependency: Job dependencies not yet satisfied
Reservation: Waiting for advanced reservation
AssociationJobLimit: User or account job limit reached

AssociationResourceLimit: User or account resource limit reached
AssociationTimeLimit: User or account time limit reached
QOSJobLimit: Quality Of Service (QOS) job limit reached
QOSResourceLimit: Quality Of Service (QOS) resource limit reached
QOSTimeLimit: Quality Of Service (QOS) time limit reached

Revision as of 17:19, 26 January 2023 (view source) Cabreraruizdiazj (talk \| contribs) ← Older edit		Revision as of 16:04, 13 February 2023 (view source) Israel.herrera (talk \| contribs) (→‎Performance) Newer edit →
Line 85:		Line 85:

	'''A''': There are many reasons why users may experience unusually low performance while using HPG. First, users should ensure that performance issues are not originated from their Internet service provider, home network, or personal devices.		'''A''': There are many reasons why users may experience unusually low performance while using HPG. First, users should ensure that performance issues are not originated from their Internet service provider, home network, or personal devices.
−
−

	Once the possible causes above are discarded, users should report the issue as soon as possible via the [https://support.rc.ufl.edu/enter_bug.cgi RC Support Ticketing System].		Once the possible causes above are discarded, users should report the issue as soon as possible via the [https://support.rc.ufl.edu/enter_bug.cgi RC Support Ticketing System].