Difference between revisions of "HiPerGator Metrics"
Line 30: | Line 30: | ||
The overall allocation includes all GPU families.<br> | The overall allocation includes all GPU families.<br> | ||
<br><br><br> | <br><br><br> | ||
− | ===CPU Status=== | + | ===[https://metrics.rc.ufl.edu/d/e6ECgorZz/hipergator-status?orgId=2&refresh=5m&var-cluster=hipergator&var-partition=All&viewPanel=14 CPU Status]=== |
− | [[ | + | [[Image:Cpu status.png|frameless|right | link=https://metrics.rc.ufl.edu/d/e6ECgorZz/hipergator-status?orgId=2&refresh=5m&var-cluster=hipergator&var-partition=All&viewPanel=14]] |
This panel shows the number of CPUs that are allocated, idle and reserved.<br> | This panel shows the number of CPUs that are allocated, idle and reserved.<br> | ||
It also displays both the current value and average value. Average is calculated based on chosen time range.<br> | It also displays both the current value and average value. Average is calculated based on chosen time range.<br> | ||
Line 41: | Line 41: | ||
|| | || | ||
===5 Minute Load Average=== | ===5 Minute Load Average=== | ||
− | [[ | + | [[Image:5 min load ave.png|frameless|right]] |
This panel shows the 5 minute load average of each login node.<br> | This panel shows the 5 minute load average of each login node.<br> | ||
A red threshold line has been added to indicate a higher than expected load average.<br> | A red threshold line has been added to indicate a higher than expected load average.<br> | ||
Line 49: | Line 49: | ||
<br><br><br> | <br><br><br> | ||
===Slurm Jobs Started per Minute=== | ===Slurm Jobs Started per Minute=== | ||
− | [[ | + | [[Image:Slurm job starts.png|frameless|right]] |
This panel shows the number of jobs started by the Slurm scheduler every minute.<br> | This panel shows the number of jobs started by the Slurm scheduler every minute.<br> | ||
To view the value of a specific point, you can hover your mouse over a bar.<br> | To view the value of a specific point, you can hover your mouse over a bar.<br> | ||
Line 55: | Line 55: | ||
<br><br><br> | <br><br><br> | ||
===1 Minute Load Average=== | ===1 Minute Load Average=== | ||
− | [[ | + | [[Image:1 min load ave.png|frameless|right]] |
This panel shows the 1 minute load average of the login nodes.<br> | This panel shows the 1 minute load average of the login nodes.<br> | ||
It is similar to the 5 minute average but this one can spike higher at times.<br> | It is similar to the 5 minute average but this one can spike higher at times.<br> |
Revision as of 19:16, 9 December 2022
Accessing the HiPerGator Status Dashboard
1. You must have a valid HiPerGator account. If you need to request an account, see the Account Request page.
2. Use your browser to access https://metrics.rc.ufl.edu
3. You will be directed to the UF GatorLink login page (it's possible this step will be skipped if you have already authenticated to other UF resources)
4. Once authenticated, you will be shown a Grafana login page
5. Enter your GatorLink credentials and click Log In
6. You will be directed to the HiPerGator Status dashboard which should look like this:
If you do not land on this page, please contact Support or file a Bugzilla ticket.
Dashboard Panels Explained
Number of UsersThis panel shows the number of users per login node. GPU AllocationThis panel shows the allocated percentage of GPUs per product family and overall. CPU StatusThis panel shows the number of CPUs that are allocated, idle and reserved. |
5 Minute Load AverageThis panel shows the 5 minute load average of each login node. Slurm Jobs Started per MinuteThis panel shows the number of jobs started by the Slurm scheduler every minute. 1 Minute Load AverageThis panel shows the 1 minute load average of the login nodes. |
General Dashboard Usage
The dashboard has several restrictions, but there are some areas that can be changed.
Changing the Time Range
You are able to change the time range of the dashboard by clicking on the box in the top right corner with the clock icon.
It is set to a default of the "Last 3 hours". When you click this, you will be presented with several preset options.
It's best to use one of the presets. Be advised, when using a longer range, the dashboard panels may be more difficult to view.
Changing the Refresh Frequency
You are able to adjust how often the dashboard panels refresh.
Click the icon in the top right with the refresh icon. It is set to a default of 5 minutes.
Be advised, making the refresh interval less than 5 minutes will increase the load on backend servers and may result in poor performance of the dashboard.
Also, some of the panels are only designed to collect data every 15 minutes, or longer, so shortening the refresh may have little to no effect.
Changing the Partition
You can select partitions of interest from the drop down menu at the top left of the dashboard.
Currently, this will only have an effect on the CPU Status panel.
You may select multiple entries from the selector, or choose All (the default).
Maximize a Panel
If you want to view only a single panel, simply click on the panel title then click View.
This will present only that panel and fill the entire dashboard making it easier to read.