Difference between revisions of "Rolling Upgrade from RHEL7 to RHEL8"

From UFRC
Jump to navigation Jump to search
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{Note|The EL7->EL8 migration of compute and login nodes on HiPerGator and HPG-AI is complete. Make sure to remove el7 constraint from your job scripts.|warn}}
 
==HiPerGator OS Upgrade: RHEL-7 → RHEL-8==
 
==HiPerGator OS Upgrade: RHEL-7 → RHEL-8==
  
HiPerGator is undergoing a rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment includes updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.2), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5).   
+
HiPerGator has gone through a rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment includes updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.2), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5).   
  
 
Some programs/workflows run in the updated software environment without any modification, and some require recompilation or adaptation.
 
Some programs/workflows run in the updated software environment without any modification, and some require recompilation or adaptation.
  
==What is happening==
+
==Rebuilding Code for EL8==
 
+
'''Note:''' PMIX v2 is no longer supported in EL8.
Starting on August 16, 2023, a subset of HiPerGator compute resources went online with the updated software environment.  On the same day, a new login node pool was introduced as <code>hpg-el8.rc.ufl.edu</code> that ran the updated software environment.  Those login nodes include login 7 through 12 and can be used for rebuilding code for el8. In the middle of the transition, which is expected to finish around September 20th we switched the login nodes for hpg.rc.ufl.edu to point to EL8 and renamed the old EL7 login nodes (1 through 6) to hpg-el7.rc.ufl.edu.
 
 
 
Right now we are steadily converting computational resources until they are all upgraded. We recommend moving to RHEL-8 at your earliest convenience.
 
 
 
==Choosing RHEL-7 or RHEL-8==
 
  
Using the '--constraint' SLURM directive, it is possible to submit jobs specifically targeting RHEL-7 with '--constraint=el7' or RHEL-8 with '--constraint=el8'. If neither is specified, the scheduler will not consider operating system in placing your jobs and  you may get either or a mix depending on the number of requested nodes.
+
If you see an error regarding pmix v2, such as "''error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory''", we recommend using the following environment modules to rebuild your code for EL8:
  
#SBATCH --constraint=el8
+
* gcc/12.2.0 openmpi/4.1.5
or
 
#SBATCH --constraint=el7
 
 
 
==How long will RHEL-7 systems remain available on HiPerGator?==
 
 
 
At this time, we expect to complete the upgrade process at midnight on September 20, 2023.  If you have concerns about access to RHEL-7 systems on HiPerGator after September 19, 2023, please [https://support.rc.ufl.edu/ contact us] as soon as possible to discuss.
 
 
 
'''Note:''' If you submit a job with <code>--constraint=el7</code> with the time limit that extends beyond 2023-09-20T00:00:00 i.e. the end of the day on 2023-09-19 the job '''will not start'''.
 
 
 
==Rebuilding Code for EL8==
 
We recommend using the following environment modules to rebuild your code for EL8:
 
 
 
* gcc/12.2.2 openmpi/4.1.5
 
 
* cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5
 
* cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5
 
* intel/2020 openmpi/4.1.5
 
* intel/2020 openmpi/4.1.5
 
* cuda/12.2.2 intel/2020 openmpi/4.1.5
 
* cuda/12.2.2 intel/2020 openmpi/4.1.5
 
* nvhpc/23.7 openmpi/4.1.5
 
* nvhpc/23.7 openmpi/4.1.5
 
+
* cuda/12.2.2 nvhpc/23.7 openmpi/4.1.5
==Known Issues==
 
===Missing PMIX_V2===
 
If you see an error similar to the following
 
srun: error: Couldn't find the specified plugin name for mpi/pmix_v2 looking at all files
 
or
 
error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory
 
Then, your job likely landed on en el8 node, which doesn't have pmix v2. Make sure to rebuild your code against a modern OpenMPI stack (openmpi/4.1.5) to work on EL8 before the end of the transition. Until the code is rebuilt please use the el7 SLURM constraint to continue running your analyses. Note that the number of el7 nodes is dwindling and is going to reach zero by September 20th. Rebuild your code now!
 

Latest revision as of 16:06, 28 September 2023

The EL7->EL8 migration of compute and login nodes on HiPerGator and HPG-AI is complete. Make sure to remove el7 constraint from your job scripts.

HiPerGator OS Upgrade: RHEL-7 → RHEL-8

HiPerGator has gone through a rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment includes updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.2), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5).

Some programs/workflows run in the updated software environment without any modification, and some require recompilation or adaptation.

Rebuilding Code for EL8

Note: PMIX v2 is no longer supported in EL8.

If you see an error regarding pmix v2, such as "error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory", we recommend using the following environment modules to rebuild your code for EL8:

  • gcc/12.2.0 openmpi/4.1.5
  • cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5
  • intel/2020 openmpi/4.1.5
  • cuda/12.2.2 intel/2020 openmpi/4.1.5
  • nvhpc/23.7 openmpi/4.1.5
  • cuda/12.2.2 nvhpc/23.7 openmpi/4.1.5