Cover image for Pixi - reproducible, scientific software workflows!

Pixi - reproducible, scientific software workflows!

Wolf Vollprecht
Written by Wolf Vollprecht 2 months ago

As scientists, your focus should be on research, not wrestling with software environments. At this year's SciPy conference, we're excited to show Pixi – a tool designed to handle the complexities of package management so you can dedicate more time to your scientific pursuits.

For our scientific projects, speed really matters so we rely heavily on C and CUDA code. With pixi, I was able to setup a project for multiple CUDA versions, MPS (Apple Silicon / Metal Performance Shaders) as well as CPU support in no time, and make it work consistently for everyone.

Guillaume Lemaitre, scikit-learn core developer previously at INRIA

What is Pixi?

Pixi history

Pixi is a new package manager built on the foundation of the conda and conda-forge ecosystem. Created by the team behind mamba, Pixi leverages the extensive conda-forge distribution, which includes a vast array of scientific software packages such as Python, R, C/C++ libraries, NumPy, SciPy, and many others.

At its core, Pixi aims to solve three critical challenges in scientific software development:

  1. Collaboration: Enabling seamless sharing and reproduction of research code
  2. Reproducibility: Ensuring consistent execution across different machines
  3. Performance: Maximizing the utilization of available hardware resources

Let's explore how Pixi addresses each of these challenges, starting with collaboration.

Collaboration: The End of "Academic Code"

It's all too common for scientists and researchers to write code that solves their immediate problems but then can never be run again. And let's face it, scientists shouldn't have to be software engineers to share their work effectively. Pixi tackles this issue head-on with several key features:

  1. Lockfiles: A pixi.lock file provides a precise record of all package versions and their dependencies in your project environment. Unlike a typical requirements.txt file, which may specify version ranges, a lockfile pins exact versions and includes checksums for verification. This ensures that when a colleague runs your code, they're using the exact same environment you developed in, eliminating the infamous "works on my machine" problem.
  2. Pixi Tasks: Complex build steps (compiling, CMake, installing compilers) become simple "pixi tasks". These tasks can express relationships and cache results, streamlining your workflow.
  3. Automated installation: Pixi automatically installs all necessary dependencies and executes predefined tasks. No more sudo apt-get install or brew install nightmares!
  4. Cross-Platform Compatibility: Collaborate seamlessly with colleagues using different hardware or operating systems.
  5. Simplified Setup: Gone are the days of lengthy setup instructions. With Pixi, your README can be drastically simplified - just run pixi run start to have all dependencies installed on all operating systems, and the main entry point of your project executed.

Short Readme thanks to Pixi

Thanks to lockfiles and pixi tasks, it is straightforward for your research collaborators (or reviewers) to recreate your plots and analyses on their own machines, or run your algorithms over their own data.

Pixi: High Reproducibility with Low Effort

When it comes to research and scientific computing environments, reproducibility is crucial. However, many existing tools fall short in balancing reproducibility with ease of use. Let's compare Pixi to some common alternatives:

  1. Pip, Mamba, and Conda: These popular package managers lack native "lockfile" functionality. To achieve reproducibility, you'd need to manually pin all version numbers or use additional tools like pip-compile. This process can be time-consuming and error-prone.
  2. Docker: While Docker containers offer a reproducible environment, creating and maintaining Dockerfiles requires significant effort. Moreover, Dockerfiles often involve multiple package managers (e.g., apt-get and pip), and since neither Dockerfiles nor apt-get understand lockfiles, a Dockerfile alone doesn't guarantee reproducibility.
  3. Poetry: Poetry is great, but only deals with Python packages while pixi deals with all native packages like the Python interpreter, C/C++ Compilers, Node, OpenSSL, etc.
  4. Packer: can be used to build full-fledged VM images, but suffers from the same reproducibility issues as Docker and is also much harder to use.

Let's visualize this using a simple plot:

Reproducibility vs. Effort Matrix

As you can see, we think that Pixi occupies a sweet spot of high reproducibility and low effort. It combines the best of both worlds:

  • Like Docker, it ensures a consistent environment across different systems.
  • Like Poetry, it uses lockfiles for precise dependency management.
  • Unlike Docker, it doesn't require learning a new containerization paradigm. The software installed also works natively on your hardware, making full use of your system capabilities (CUDA, Apple Silicon, etc.). GUI applications are also easy to run.
  • Unlike manual version pinning, it automates the process of creating and updating lockfiles.

Pixi maximizes reproducibility while minimizing the time and effort required to set up and maintain your scientific computing environment. This allows you to focus more on your research and less on environment management.

And by the way, it’s also great when teaching students. Pixi makes it really simple to set up deterministic learning environments for all the students – regardless of their choice of laptop (macOS, Windows, Linux …).

Performance: Unleash Your Hardware's Potential

As a scientist, you're likely dealing with computationally intensive tasks that push your hardware to its limits. Maybe you're a geospatial researcher ingesting terabytes of satellite imagery and struggling with processing times. Or perhaps you're a genomics expert grappling with massive sequencing datasets that strain your system's memory. You might even be an AI researcher working with the latest open-source generative models, where every optimization counts. Whatever your field, the last thing you need is software that doesn't take full advantage of your hardware capabilities. For computationally intensive research, such as AI, weather simulations, or robotics, Pixi ensures you're leveraging your hardware to its fullest:

  1. GPU Optimization: Thanks to collaboration between NVIDIA and conda-forge, Pixi provides access to optimized CUDA packages and GPU-accelerated builds of key libraries.
  2. Hardware-Aware Environments: The system-requirements feature ensures you're using the right packages for your specific hardware configuration.

Real-World Example: GPU-Accelerated Machine Learning with JAX

Let's see how Pixi simplifies setting up a GPU-accelerated machine learning environment using JAX:

# pixi.toml
[project]
name = "jax-gpu-example"
version = "0.1.0"
description = "GPU-accelerated machine learning with JAX"
 
[tasks]
train = "python train_model.py"
 
[system-requirements]
cuda = "12.0"
 
[dependencies]
python = "3.12.*"
jax = { version = “0.4” }

With this configuration, researchers can simply run:

pixi run train

Pixi will set up the appropriate CUDA environment, install JAX with GPU support, and execute the training script. No manual CUDA setup required!

For example, this repository contains impressive demos using pixi from the rerun team - that you can run with a simple pixi run start command.

Rerun Demo for Carla

Pixi in the Scientific Ecosystem

Pixi doesn't exist in isolation – it's designed to complement and enhance your existing scientific workflow:

  1. Version Control Integration: While Git manages your code, Pixi manages your dependencies. The pixi.lock file can be version-controlled alongside your code, ensuring environment consistency across git commits.
  2. Jupyter Notebook Compatibility: Use Pixi to create consistent environments for Jupyter notebooks, ensuring that all users have the same package versions and dependencies.
  3. Docker Alternative: Unlike Docker, which requires learning a new syntax and dealing with image management, Pixi provides similar reproducibility benefits with a much gentler learning curve.
  4. Virtual Environment Enhancement: Pixi goes beyond traditional virtual environments by managing not just Python packages, but also system-level dependencies and even different programming languages within the same project.

See you soon at SciPy?

Pixi is more than just a package manager - it aims to be a comprehensive solution for the challenges faced by scientific software developers. By simplifying collaboration, ensuring reproducibility, and optimizing performance, Pixi allows researchers to focus on what really matters: their scientific work.

Ready to get started? Visit our documentation to learn more about integrating Pixi into your research workflow or check out our examples and leave a star on the GitHub repository. Or even better, visit our booth at SciPy to see Pixi in action and discuss what we can do to enhance your specific research needs.

If you don't make it out to SciPy this year, do not hesitate to join our Discord to chat with us any time.