Difference between revisions of "Conda"
Line 121: | Line 121: | ||
{{Note|'''If you plan on using a GPU''' see below|warn}} | {{Note|'''If you plan on using a GPU''' see below|warn}} | ||
− | To make sure your code will run on GPUs install a recent <code>cudatoolkit</code> package that works with the NVIDIA drivers on HPG (currently | + | To make sure your code will run on GPUs install a recent <code>cudatoolkit</code> package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch or tensorflow package(s). See RC provided tensorflow or pytorch installs for examples if needed. Mamba can detect if there is a gpu in the environment, so the easiest approach is to run the mamba install command in a gpu session. Alternatively, you can run mamba install on any node or if a cpu-only pytorch package was already installed by explicitly requiring a gpu version of pytorch when running mamba install. E.g. |
− | mamba install cudatoolkit=11. | + | mamba install cudatoolkit=11.3 pytorch=1.12.1=gpu_cuda* -c pytorch |
'''<big>Load the <code>conda</code> module</big>''' | '''<big>Load the <code>conda</code> module</big>''' | ||
Revision as of 20:14, 16 April 2024
For help on creating and managing personal environments whether for command-tool use or python package use in SLURM jobs or Jupyter kernels see Managing Python environments and Jupyter kernels
Description
Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments. Separating applications in separate conda environments allows installation of incompatible dependencies - python2 and python3 for example.
Note: For a faster conda see Mamba.
Environment Modules
Run module spider conda
to find out what environment modules are available for this application.
System Variables
- HPC_CONDA_DIR - installation directory
- HPC_CONDA_BIN - executable directory
Background
Many projects that use Python code require careful management of the respective Python environments. Rapid changes in package dependencies, package version conflicts, deprecation of APIs (function calls) by individual projects, and obsolescence of system drivers and libraries make it virtually impossible to use an arbitrary set of packages or create one all-encompassing environment that will serve everyone's needs over long periods of time. The high velocity of changes in the popular ML/DL frameworks and packages and GPU computing exacerbates the problem.
The problem with pip install
Expand this section to view pip problems and how conda/mamba mends them.
Most guides and project documentation for installing python packages recommend using pip install
for package installation. While pip
is easy to use and works for many use cases, there are some major drawbacks. There are a few issues with doing pip install
on a supercomputer like HiPerGator:
- Pip by default installs binary packages (wheels), which are often built on systems incompatible with HiPerGator. This can lead to importing errors, and its attempts to build from source will fail without additional configuration.
- If you pip installing a package that is/will be installed in an environment provided by UFRC, your pip version will take precedence. Your dependencies eventually become incompatible causing errors, with even one pip install making environments unusable.
- Different packages may require different versions of the same package as dependencies leading to impossible to reconcile installation scenarios. This becomes a challenge to manage with
pip
as there isn't a method to swap active versions. - On its own, `pip` installs **everything** in one location:
~/.local/lib/python3.X/site-packages/
.
Conda and Mamba to the rescue!
conda
and the newer, faster, drop-in replacement mamba
, were written to solve some of these issues. They represent a higher level of packaging abstraction that can combine compiled packages, applications, and libraries as well as pip-installed python packages. They also allow easier management of project-specific environments and switching between environments as needed. They make it much easier to report the exact configuration of packages in an environment, facilitating reproducibility. Moreover, conda environments don't even have to be activated to be used; in most cases adding the path to the conda environment's 'bin' directory to the $PATH in the shell environment is sufficient for using them.
A caveat
conda
and mamba
get packages from channels, or repositories of prebuilt packages packages. While there are several available channels, like the main conda-forge
, not every Python package is available from a conda
channel as they have to be packaged for conda
first. You may still need to use pip
to install some packages as noted later. However, conda
still helps manage environment by installing packages into separate directory trees rather than trying to install all packages into a single folder that pip does.
Configuration
Expand this section to view instructions for configuring Conda
The ~/.condarc
configuration file
conda
's behavior is controlled by a configuration file in your home directory called .condarc
. The dot at the start of the name means that the file is hidden from 'ls' file listing command by default. If you have not run conda
before, you won't have this file. Whether the file exists or not, the steps here will help you modify the file to work best on HiPerGator. First load of the conda
environment module on HiPerGator will put the current best practice .condarc
into your home directory.
conda
package cache location
conda
caches (keeps a copy) of all downloaded packages by default in the ~/.conda/pkgs
directory tree. If you install a lot of packages you may end up filling up your home quota. You can change the default package cache path. To do so, add or change the pkgs_dirs
setting in your ~/.condarc
configuration file e.g.:
pkgs_dirs: - /blue/mygroup/share/conda/pkgs # or alternatively: - /blue/mygroup/$USER/conda/pkgs
Replace mygroup
with your actual group name.
conda
environment location
conda
puts all packages installed in a particular environment into a single directory. By default named conda
environments are created in the ~/.conda/envs
directory tree. They can quickly grow in size and, especially if you have many environments, fill the 40GB home directory quota. For example, the environment we will create in this training is 5.3GB in size. As such, it is important to use path based (conda create -p PATH) conda environments, which allow you to use any path for a particular environment for example allowing you to keep a project-specific conda environment close to the project data in /blue/ where you group has terrabyte(s) of space.
You can also change the default path for the name environments (conda create -n NAME
) if you prefer to keep all conda
environments in the same directory tree. To do so, add or change the envs_dirs
setting in the ~/.condarc
configuration file e.g.:
envs_dirs: - /blue/mygroup/share/conda/envs #or alternatively: - /blue/mygroup/$USER/conda/envs
Replace mygroup
with your actual group name.
One way to edit your ~/.condarc
file is to type: nano ~/.condarc`
If the file is empty, paste in the text below, editing the env_dirs:
and pkg_dirs
as below. If the file has contents, update those lines.
~/.condarc
should look something like this when you are done editing (again, replacing group
and user
in the paths with your group and username).channels: - conda-forge - bioconda - defaults envs_dirs: - /blue/group/user/conda/envs pkgs_dirs: - /blue/group/user/conda/pkgs auto_activate_base: false auto_update_conda: false always_yes: false show_channel_urls: false
Create and activate a Conda environment
Expand this section to view instructions for setting up environments.
UF Research Computing Applications Team uses conda for many application installs behind the scenes. We are happy to install applications on request for you. However, if you would like to use conda to create multiple environments for your personal projects we encourage you to do so. Here are some recommendations for successful conda use on HiPerGator.
- See https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html for the original documentation on managing conda environments.
- We recommend creating environments by 'path', so they won't fill up your home directory (check quota with home_quota). The resulting environment should be located in the project(s) directory tree in /blue for better tracking of installs and better filesystem performance compared to home.
To make sure your code will run on GPUs install a recent cudatoolkit
package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch or tensorflow package(s). See RC provided tensorflow or pytorch installs for examples if needed. Mamba can detect if there is a gpu in the environment, so the easiest approach is to run the mamba install command in a gpu session. Alternatively, you can run mamba install on any node or if a cpu-only pytorch package was already installed by explicitly requiring a gpu version of pytorch when running mamba install. E.g.
mamba install cudatoolkit=11.3 pytorch=1.12.1=gpu_cuda* -c pytorch
Load the conda
module
Before we can run conda
or mamba
on HiPerGator, we need to load the conda
module:
module load conda
Create your first environment
Create a name based environment
To create your first name based (see path based instructions below)conda
environment, run the following command. In this example, I am creating an environment named hfrl
:
mamba create -n hfrl
The screenshot to the right is the output from running that command. Yours should look similar.
Note: You do not need to manually create the folders that you setup in your ~/.condarc
file. mamba
will take care of that for you.
Create a path based environment
To create a path based conda
environment use the '-p PATH' argument:
mamba create -p PATH
e.g.
mamba create -p /blue/mygroup/share/project42/conda/envs/hfrl/
Activate the new environment
To activate our environment (whether created with mamba
or conda we use the conda activate env_name
command. Let's activate our new environment:
conda activate hfrl
or
conda activate /blue/mygroup/share/project42/conda/envs/hfrl/
Notice that your command prompt changes when you activate an environment to indicate which environment is active, showing that in parentheses before the other information:
(hfrl) [magitz@c0907a-s23 magitz]$
Note: path based environment activation is really only needed for package installation. For using the environment just add the path to its bin
directory to $PATH in your job script.
Once you are done installing packages inside the environment you can use
$ conda deactivate
We do not recommend activating conda environments when _using_ them i.e. running programs installed in the environments. Please prepend the path to that environment to your $PATH instead.
E.g. If you have a project-specific conda environment at '/home/myuser/envs/project1/' add the following into your job script before executing any commands
export PATH=/home/myuser/envs/project1/bin:$PATH
Export or import an environment
Expand this section to view instructions.
Export your environment to an environment.yml
file
Now that you have your environment working, you may want to document its contents and/or share it with others. The environment.yml
file defines the environment and can be used to build a new environment with the same setup.
To export an environment file from an existing environment, run:
conda env export > hfrl.yml
You can inspect the contents of this file with cat hfrl.yml
. This file defines the packages and versions that make up the environment as it is at this point in time. Note that it also includes packages that were installed via pip
.
Create an environment from a yaml file
If you share the environment yaml file created above with another user, they can create a copy of your environment using the command:
conda env create --file hfrl.yml
They may need to edit the last line to change the location to match where they want their environment created.
Group environments
It is possible to create a shared environment accessed by a group on HiPerGator, storing the environment in, for example, /blue/group/share/conda
. In general, this works best if only one user has write access to the environment. All installs should be made by that one user and should be communicated with the other users in the group.
__NOTOC__