Reproducible Notebooks with Pixi
Data scientists and researchers love to work with Jupyter Notebooks. It's a great for several reasons: providing a way to interactively explore data, make plots and share the contents in the form of "literate programming" (a term coined by Donald Knuth!).
And many Jupyter Notebook users also like to use conda packages to set up their
environment. Before pixi, it was relatively tricky to get reproducible conda
environment set up. You could do it by using some community projects like
conda-lock
or conda-pack
, but by no means that was easy.
With pixi, we learned a lot of lessons from other tools – not only from the conda ecosystem, but also from other ecosystems like npm, pip, and docker. We wanted to make it easy to create a reproducible environment, and to share it with others.
Let us walk through the process of creating a shareable, reproducible pixi environment to work with JupyterLab.
First, we start with a pixi.toml file:
As you can see, we have a few sections in the pixi.toml file:
project
: This section contains metadata about the project, such as the name, version, description, authors, and the channels to use for the environment.tasks
: This section contains a list of tasks that can be run with pixi. In this case, we have a single task calledstart
that runsjupyter lab
. Interestingly, we use a syntax that is very similar to bash and runs on all platforms.dependencies
: This section contains a list of dependencies that should be installed. These dependencies are installed from the channels specified in theproject
section. Later we will see how to add pypi dependencies.
To create the environment, we can run pixi install
. But it can be even
simpler: we can just run pixi run start
to execute the start
task and pixi
will take care of everything: resolving, downloading and installing the
environment.
Reproducibilty with lockfiles
If you execute pixi install
or pixi run start
you might see multiple
progress bars. That is because pixi is resolving the environment for all 4
platforms at once. This is a powerful feature of pixi: it can create a lockfile
that contains the exact versions of all dependencies for all platforms. This
lockfile can be shared with others, and they can use it to recreate the same
environment as you have. This lockfile is written as pixi.lock
and lives right
next to the pixi.toml
file. For most projects we advise to check in the
lockfile as part of the git repository that hosts the rest of the code, so that
you know what versions of the packages were used at the time of the last commit.
If you check in your lockfile, your coworkers will also have a very easy time getting started: no need to wait for any dependency resolutions! Pixi will just install the environment as specified in the lockfile.
Adding more dependencies
It wouldn't be science if you would not use more than just jupyterlab
and the
Python standard library! To add more packages to your environment you can either
edit the toml file or add them via the pixi CLI:
This will add the numpy package to the dependencies section of the pixi.toml file and install it from conda-forge (or any other channel you specified).
Sometimes there may be a dependency that is not yet on conda-forge. In that case, it should be just as easy to add it from PyPI, the Python package index:
This will add the matplotlib package to the dependencies section of the
pixi.toml file and install it from PyPI. You will find the package under the
pypi-dependencies
and it's also added to the lockfile. However, our advice is
to use conda
-packages from conda-forge wherever possible because it's safer
and faster to install.
Adding more tasks
If you want to prepare your data, download some datasets from the internet or
perform any other repetitive tasks, you can easily add more tasks to your
pixi.toml
file:
In this example, we have added two new tasks: prepare
and index_data
. The
index_data
task depends on the prepare
task, so pixi will make sure that
prepare
is executed before index_data
.
Read more
We hope this gives you a good overview of how to use pixi to create a reproducible environment for your Jupyter Notebooks. If you want to learn more, check out the pixi documentation. It also contains descriptions for advanced features such as creating multiple environments with optional dependencies, e.g. for testing or building documentation, so be sure to check the docs out.