Difference between revisions of "Rolling Upgrade from RHEL7 to RHEL8"

From UFRC
Jump to navigation Jump to search
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{Note|The EL7->EL8 migration of compute and login nodes on HiPerGator and HPG-AI is complete. Make sure to remove el7 constraint from your job scripts.|warn}}
 
==HiPerGator OS Upgrade: RHEL-7 → RHEL-8==
 
==HiPerGator OS Upgrade: RHEL-7 → RHEL-8==
  
HiPerGator is beginning rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment will include updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.x), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5).   
+
HiPerGator has gone through a rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment includes updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.2), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5).   
  
Some programs/workflows will run in the updated software environment without any modification, and some will require recompilation or adaptation.
+
Some programs/workflows run in the updated software environment without any modification, and some require recompilation or adaptation.
  
==What will happen==
+
==Rebuilding Code for EL8==
 
+
'''Note:''' PMIX v2 is no longer supported in EL8.
On August 16, 2023, a subset of HiPerGator compute resources will go online with the updated software environment.  On the same day, a new login node pool will be introduced as <code>hpg-el8.rc.ufl.edu</code> that runs the updated software environment.  If you need to rebuild your application, you can do it here.
 
 
 
Over the next 6-7 weeks, additional compute resources will be upgraded at a steady rate until they are all upgraded.  Once more than half of the compute resources are upgraded, the new login nodes will become default as <code>hpg.rc.ufl.edu</code>.
 
  
We recommend moving to RHEL-8 at your earliest convenience.
+
If you see an error regarding pmix v2, such as "''error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory''", we recommend using the following environment modules to rebuild your code for EL8:
==Choosing RHEL-7 or RHEL-8==
 
 
 
Using the '--constraint' SLURM directive, it is possible to submit jobs specifically targeting RHEL-7 with '--constraint=el7' or RHEL-8 with '--constraint=el8'. If neither is specified, the scheduler will not consider operating system in placing your jobs and  you may get either or a mix depending on the number of requested nodes.
 
 
 
#SBATCH --constraint=el8
 
or
 
#SBATCH --constraint=el7
 
 
 
==How long will RHEL-7 systems remain available on HiPerGator?==
 
 
 
At this time, we expect to complete the upgrade process on September 20, 2023.  If you have concerns about access to RHEL-7 systems on HiPerGator after September 20, 2023, please [https://support.rc.ufl.edu/ contact us] as soon as possible to discuss.
 
 
 
==Rebuilding Code for EL8==
 
We recommend using the following environment modules to rebuild your code for EL8:
 
  
 
* gcc/12.2.0 openmpi/4.1.5
 
* gcc/12.2.0 openmpi/4.1.5
* cuda/12.2.0 gcc/12.2.0 openmpi/4.1.5
+
* cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5
 
* intel/2020 openmpi/4.1.5
 
* intel/2020 openmpi/4.1.5
* cuda/12.2.0 intel/2020 openmpi/4.1.5
+
* cuda/12.2.2 intel/2020 openmpi/4.1.5
* nvhpc/23.5 openmpi/4.1.5
+
* nvhpc/23.7 openmpi/4.1.5
 
+
* cuda/12.2.2 nvhpc/23.7 openmpi/4.1.5
==Known Issues==
 
===The 'module' command is missing===
 
 
 
If you see an error in the job log about missing 'module' command make sure to
 
 
 
* Submit the job from the appropriate login environment.
 
** To use EL7 resources connect to hpg-el7.rc.ufl.edu or ssh into login 1 through 6 within HPG before submitting the job.
 
** To use EL8 resources connect to hpg.rc.ufl.edu or ssh into login 7 through 12 within HPG before submitting the job.
 
 
 
===Missing PMIX_V2===
 
If you see an error similar to the following
 
srun: error: Couldn't find the specified plugin name for mpi/pmix_v2 looking at all files
 
or
 
error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory
 
Then, your job likely landed on en el8 node, which doesn't have pmix v2. Make sure to rebuild your code against a modern OpenMPI stack (openmpi/4.1.5 or later) to work on EL8 before the end of the transition. Until the code is rebuilt please use the el7 SLURM constraint to continue running your analyses.
 

Latest revision as of 16:06, 28 September 2023

The EL7->EL8 migration of compute and login nodes on HiPerGator and HPG-AI is complete. Make sure to remove el7 constraint from your job scripts.

HiPerGator OS Upgrade: RHEL-7 → RHEL-8

HiPerGator has gone through a rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment includes updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.2), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5).

Some programs/workflows run in the updated software environment without any modification, and some require recompilation or adaptation.

Rebuilding Code for EL8

Note: PMIX v2 is no longer supported in EL8.

If you see an error regarding pmix v2, such as "error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory", we recommend using the following environment modules to rebuild your code for EL8:

  • gcc/12.2.0 openmpi/4.1.5
  • cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5
  • intel/2020 openmpi/4.1.5
  • cuda/12.2.2 intel/2020 openmpi/4.1.5
  • nvhpc/23.7 openmpi/4.1.5
  • cuda/12.2.2 nvhpc/23.7 openmpi/4.1.5