Difference between revisions of "Rolling Upgrade from RHEL7 to RHEL8"

From UFRC
Jump to navigation Jump to search
Line 48: Line 48:
 
  error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory
 
  error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory
 
Then, your job likely landed on en el8 node, which doesn't have pmix v2. Make sure to rebuild your code against a modern OpenMPI stack (openmpi/4.1.5 or later) to work on EL8 before the end of the transition. Until the code is rebuilt please use the el7 SLURM constraint to continue running your analyses.
 
Then, your job likely landed on en el8 node, which doesn't have pmix v2. Make sure to rebuild your code against a modern OpenMPI stack (openmpi/4.1.5 or later) to work on EL8 before the end of the transition. Until the code is rebuilt please use the el7 SLURM constraint to continue running your analyses.
===Missing libcrypto or libssl 1.0===
 
The system libcrypto and libssl libraries have been upgraded from version 1.0 (shown as version 10 in some errors) to version 1.1 in EL8. If your code is built against the old libraries it will have to be rebuilt before the end of the transition.
 

Revision as of 20:43, 13 September 2023

HiPerGator OS Upgrade: RHEL-7 → RHEL-8

HiPerGator is beginning rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment will include updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.x), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5).

Some programs/workflows will run in the updated software environment without any modification, and some will require recompilation or adaptation.

What will happen

On August 16, 2023, a subset of HiPerGator compute resources will go online with the updated software environment. On the same day, a new login node pool will be introduced as hpg-el8.rc.ufl.edu that runs the updated software environment. If you need to rebuild your application, you can do it here.

Over the next 6-7 weeks, additional compute resources will be upgraded at a steady rate until they are all upgraded. Once more than half of the compute resources are upgraded, the new login nodes will become default as hpg.rc.ufl.edu.

We recommend moving to RHEL-8 at your earliest convenience.

Choosing RHEL-7 or RHEL-8

Using the '--constraint' SLURM directive, it is possible to submit jobs specifically targeting RHEL-7 with '--constraint=el7' or RHEL-8 with '--constraint=el8'. If neither is specified, the scheduler will not consider operating system in placing your jobs and you may get either or a mix depending on the number of requested nodes.

#SBATCH --constraint=el8

or

#SBATCH --constraint=el7

How long will RHEL-7 systems remain available on HiPerGator?

At this time, we expect to complete the upgrade process on September 20, 2023. If you have concerns about access to RHEL-7 systems on HiPerGator after September 20, 2023, please contact us as soon as possible to discuss.

Rebuilding Code for EL8

We recommend using the following environment modules to rebuild your code for EL8:

  • gcc/12.2.0 openmpi/4.1.5
  • cuda/12.2.0 gcc/12.2.0 openmpi/4.1.5
  • intel/2020 openmpi/4.1.5
  • cuda/12.2.0 intel/2020 openmpi/4.1.5
  • nvhpc/23.5 openmpi/4.1.5

Known Issues

The 'module' command is missing

If you see an error in the job log about missing 'module' command make sure to

  • Submit the job from the appropriate login environment.
    • To use EL7 resources connect to hpg-el7.rc.ufl.edu or ssh into login 1 through 6 within HPG before submitting the job.
    • To use EL8 resources connect to hpg.rc.ufl.edu or ssh into login 7 through 12 within HPG before submitting the job.

Missing PMIX_V2

If you see an error similar to the following

srun: error: Couldn't find the specified plugin name for mpi/pmix_v2 looking at all files

or

error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory

Then, your job likely landed on en el8 node, which doesn't have pmix v2. Make sure to rebuild your code against a modern OpenMPI stack (openmpi/4.1.5 or later) to work on EL8 before the end of the transition. Until the code is rebuilt please use the el7 SLURM constraint to continue running your analyses.