Difference between revisions of "Managing Python environments and Jupyter kernels"
(Created page with "= Managing project-specific application Python environments = == Background == Many projects that use Python code require careful management of the respective Python environ...") |
|||
Line 16: | Line 16: | ||
* Different packages may require different versions of the same package as dependencies leading to impossible to reconcile installation scenarios. This becomes a challenge to manage with <code>pip</code> as there isn't a method to swap active versions. | * Different packages may require different versions of the same package as dependencies leading to impossible to reconcile installation scenarios. This becomes a challenge to manage with <code>pip</code> as there isn't a method to swap active versions. | ||
* On its own, `pip` installs **everything** in one location: <code>~/.local/lib/python3.X/site-packages/</code>. All packages installed are in the same location for any given version of Python. | * On its own, `pip` installs **everything** in one location: <code>~/.local/lib/python3.X/site-packages/</code>. All packages installed are in the same location for any given version of Python. | ||
+ | |||
+ | == Conda and Mamba to the rescue! == | ||
+ | |||
+ | <img src='https://mamba.readthedocs.io/en/latest/_static/logo.png' alt='Mamba logo' width='200' align='right'> | ||
+ | |||
+ | <code>conda</code> and the newer, faster, drop-in replacement <code>mamba</code>, were written to solve some of these issues. They represent a higher level of packaging abstraction that can combine compiled packages, applications, and libraries as well as </code>pip</code>-installed python packages. They also allow easier management of project-specific environments and switching between environments as needed. They make it much easier to report the exact configuration of packages in an environment, facilitating reproducibility (recreation of an environment on a different system). Moreover, conda environments don't even have to be activated to be used. In most cases adding the path to the conda environment's 'bin' directory to the $PATH in the shell environment is sufficient for using them. | ||
+ | |||
+ | Check out the [UFRC Help page on conda](https://help.rc.ufl.edu/doc/Conda) for additional information. |
Revision as of 15:02, 24 May 2022
Managing project-specific application Python environments
Background
Many projects that use Python code require careful management of the respective Python environments. Rapid changes in package dependencies, package version conflicts, deprecation of APIs (function calls) by individual projects, and obsolescence of system drivers and libraries make it virtually impossible to use an arbitrary set of packages or create one all-encompassing environment that will serve everyone's needs over long periods of time. The high velocity of changes in the popular ML/DL frameworks and packages and GPU computing exacerbates the problem.
<img src="" alt="Python environment conundrum" width='200' align="right">
The problem with pip install
Most guides and project documentation for installing python packages recommend using pip install
for package installation. While pip
is easy to use and works for many use cases, there are some major drawbacks. If you have spent any time working in Python, you will likely have seen (and may have run) suggestions to pip install ____
, or within Jupyter !pip install ____
, to install one ore more package. There are a few issues with doing pip install
on a supercomputer like HiPerGator, though:
- Pip by default installs binary packages (wheels), which are often built on systems incompatible with HiPerGator. If you pip install a package and attempt to import it you might see an error about missing symbols or GLIBC version.
- Pip install of a package with no binary distribution (wheel) will attempt to build a package from source, but that build will likely fail without additional configuration.
- If you pip install a package that is already installed or will be later installed in an environment provided by UFRC, your version will take precedence over the packages installed in an environment provided by an environment module (or Jupyter kernel). Eventually package dependencies will become incompatible and you will encounter installation errors, import errors, or missing or wrong function calls (API changes). An innocuous
pip install
of a single package can result in a drastic change of the environment rendering it unusable. - Different packages may require different versions of the same package as dependencies leading to impossible to reconcile installation scenarios. This becomes a challenge to manage with
pip
as there isn't a method to swap active versions. - On its own, `pip` installs **everything** in one location:
~/.local/lib/python3.X/site-packages/
. All packages installed are in the same location for any given version of Python.
Conda and Mamba to the rescue!
<img src='https://mamba.readthedocs.io/en/latest/_static/logo.png' alt='Mamba logo' width='200' align='right'>
conda
and the newer, faster, drop-in replacement mamba
, were written to solve some of these issues. They represent a higher level of packaging abstraction that can combine compiled packages, applications, and libraries as well as pip-installed python packages. They also allow easier management of project-specific environments and switching between environments as needed. They make it much easier to report the exact configuration of packages in an environment, facilitating reproducibility (recreation of an environment on a different system). Moreover, conda environments don't even have to be activated to be used. In most cases adding the path to the conda environment's 'bin' directory to the $PATH in the shell environment is sufficient for using them.
Check out the [UFRC Help page on conda](https://help.rc.ufl.edu/doc/Conda) for additional information.