|
|
(12 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
| + | {{Note|The EL7->EL8 migration of compute and login nodes on HiPerGator and HPG-AI is complete. Make sure to remove el7 constraint from your job scripts.|warn}} |
| ==HiPerGator OS Upgrade: RHEL-7 → RHEL-8== | | ==HiPerGator OS Upgrade: RHEL-7 → RHEL-8== |
| | | |
− | HiPerGator is beginning rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment will include updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.x), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5). | + | HiPerGator has gone through a rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment includes updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.2), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5). |
| | | |
− | Some programs/workflows will run in the updated software environment without any modification, and some will require recompilation or adaptation. | + | Some programs/workflows run in the updated software environment without any modification, and some require recompilation or adaptation. |
| | | |
− | ==What will happen== | + | ==Rebuilding Code for EL8== |
− | | + | '''Note:''' PMIX v2 is no longer supported in EL8. |
− | On August 16, 2023, a subset of HiPerGator compute resources will go online with the updated software environment. On the same day, a new login node pool will be introduced as <code>hpg-el8.rc.ufl.edu</code> that runs the updated software environment. If you need to rebuild your application, you can do it here.
| |
− | | |
− | Over the next 6-7 weeks, additional compute resources will be upgraded at a steady rate until they are all upgraded. Once more than half of the compute resources are upgraded, the new login nodes will become default as <code>hpg.rc.ufl.edu</code>.
| |
| | | |
− | We recommend moving to RHEL-8 at your earliest convenience.
| + | If you see an error regarding pmix v2, such as "''error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory''", we recommend using the following environment modules to rebuild your code for EL8: |
− | ==Choosing RHEL-7 or RHEL-8==
| |
− | | |
− | Using the '--constraint' SLURM directive, it is possible to submit jobs specifically targeting RHEL-7 with '--constraint=el7' or RHEL-8 with '--constraint=el8'. If neither is specified, the scheduler will not consider operating system in placing your jobs and you may get either or a mix depending on the number of requested nodes.
| |
− | | |
− | #SBATCH --constraint=el8
| |
− | or
| |
− | #SBATCH --constraint=el7
| |
− | | |
− | ==How long will RHEL-7 systems remain available on HiPerGator?==
| |
− | | |
− | At this time, we expect to complete the upgrade process on September 20, 2023. If you have concerns about access to RHEL-7 systems on HiPerGator after September 20, 2023, please [https://support.rc.ufl.edu/ contact us] as soon as possible to discuss.
| |
− | | |
− | ==Rebuilding Code for EL8==
| |
− | We recommend using the following environment modules to rebuild your code for EL8:
| |
| | | |
| * gcc/12.2.0 openmpi/4.1.5 | | * gcc/12.2.0 openmpi/4.1.5 |
− | * cuda/12.2.0 gcc/12.2.0 openmpi/4.1.5 | + | * cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5 |
| * intel/2020 openmpi/4.1.5 | | * intel/2020 openmpi/4.1.5 |
− | * cuda/12.2.0 intel/2020 openmpi/4.1.5 | + | * cuda/12.2.2 intel/2020 openmpi/4.1.5 |
− | * nvhpc/23.5 openmpi/4.1.5 | + | * nvhpc/23.7 openmpi/4.1.5 |
− | | + | * cuda/12.2.2 nvhpc/23.7 openmpi/4.1.5 |
− | ==Known Issues==
| |
− | ===The 'module' command is missing===
| |
− | | |
− | If you see an error in the job log about missing 'module' command make sure to
| |
− | | |
− | * Submit the job from the appropriate login environment. | |
− | ** To use EL7 resources connect to hpg-el7.rc.ufl.edu or ssh into login 1 through 6 within HPG before submitting the job.
| |
− | ** To use EL8 resources connect to hpg.rc.ufl.edu or ssh into login 7 through 12 within HPG before submitting the job.
| |
− | | |
− | ===Missing PMIX_V2===
| |
− | If you see an error similar to the following
| |
− | srun: error: Couldn't find the specified plugin name for mpi/pmix_v2 looking at all files
| |
− | or
| |
− | error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory
| |
− | Then, your job likely landed on en el8 node, which doesn't have pmix v2. Make sure to rebuild your code against a modern OpenMPI stack (openmpi/4.1.5 or later) to work on EL8 before the end of the transition. Until the code is rebuilt please use the el7 SLURM constraint to continue running your analyses.
| |
The EL7->EL8 migration of compute and login nodes on HiPerGator and HPG-AI is complete. Make sure to remove el7 constraint from your job scripts.
HiPerGator OS Upgrade: RHEL-7 → RHEL-8
HiPerGator has gone through a rolling upgrade process from Red Hat Enterprise Linux version 7 (RHEL-7) to version 8 (RHEL-8). The upgraded software environment includes updates to important components including, but not limited to, NVIDIA GPU drivers, CUDA (12.2), communication libraries, compilers (gcc/12 and intel/2023), and OpenMPI (4.1.5).
Some programs/workflows run in the updated software environment without any modification, and some require recompilation or adaptation.
Rebuilding Code for EL8
Note: PMIX v2 is no longer supported in EL8.
If you see an error regarding pmix v2, such as "error while loading shared libraries: libpmix.so.2: cannot open shared object file: No such file or directory", we recommend using the following environment modules to rebuild your code for EL8:
- gcc/12.2.0 openmpi/4.1.5
- cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5
- intel/2020 openmpi/4.1.5
- cuda/12.2.2 intel/2020 openmpi/4.1.5
- nvhpc/23.7 openmpi/4.1.5
- cuda/12.2.2 nvhpc/23.7 openmpi/4.1.5