Cover image for Introducing rip - the fast & barebones pip implementation

Introducing rip - the fast & barebones pip implementation

Wolf Vollprecht
Written by Wolf Vollprecht 6 months ago

Introduction

It's no secret anymore that we've been working on a shiny new thing: a pip/PyPI package resolver and installer in Rust. We are doing this to build a strong bridge between the two worlds: conda and PyPI. Many real-world “environments” mix PyPI and conda-forge packages. rip is a low-level library that we are integrating into pixi.

Conda also has the ability to mix PyPI and conda packages but does so in a pretty limited fashion, it just invokes pip after installing packages with conda. We are planning to more tightly integrate PyPI support into pixi, which includes adding the PyPI packages to the lockfiles. Our goal is a blissful experience for pixi users.

We took all the experience we have gotten from implementing the low-level rattler crates as well as the Rust SAT solver resolvo and applied that to a new problem - PyPI packages.

There are a few key differences between Python packages and conda packages:

  • Metadata fetching: With conda you get all the metadata upfront, from a central "index" that describes all the available package versions and dependencies. This file is called "repodata" and encoded as JSON.
  • Wheel files metadata: the package metadata on PyPI is encoded inside the wheel files (or not existent for source distributions, sdists). To get to the metadata, the wheel files need to be downloaded or for sdists, they need to be built to a wheel first. This is a little better for new uploads since PEP 658 where the metadata is also stored alongside the wheel file and can be downloaded individually. The python package metadata inside the wheel files is also a completely different format (conda uses JSON while Python packages use a custom format).
  • Dependency specs: Conda and PyPI use similar dependency range specifier syntax, but the details are quite different. PyPI supports dev and prerelease versions that conda does not yet have.

Step 1: Making the solver generic

When we started out with our own SAT solver we made it work for conda packages - because that is what we know best and where we initially wanted to make use of the solver. But we soon had the vision to also handle other package formats. PyPI is a clear first choice given the history of the conda project and the existing ecosystem of “environment.yml” files that mix conda and pip.

That is why step 1 was to make the solver standalone and generic – we removed every trace of conda from it, and renamed it resolvo. It now lives in its own Github repository and crate. It is quite straightforward to implement support for new package formats now.

One just needs to implement the following trait. An ordering between packages (highest version number / most preferable package first!), and a dependency matcher (given a string to find all packages that match the version specifier string).

pub trait DependencyProvider<VS: VersionSet, N: PackageName = String>: Sized {
    // The only required methodes
    fn pool(&self) -> &Pool<VS, N>;
    fn sort_candidates(
        &self,
        solver: &SolverCache<VS, N, Self>,
        solvables: &mut [SolvableId]
    );
    fn get_candidates(&self, name: NameId) -> Option<Candidates>;
    fn get_dependencies(&self, solvable: SolvableId) -> Dependencies;
}

You can find an example package ecosystem implementation in the tests of resolvo.

Step 2: Make the solver lazy

As described earlier, the metadata of Python packages is hidden inside the wheel files. This makes it impractical to obtain all available metadata right away as one would need to download many packages (and individual packages can be quite large!). pip works around this by being “lazy” - only retrieve the files that are necessary and expand the tree of packages when needed. After some head scratching and watching some lectures about CDCL Bas & Tim found a way to make the solver lazy while keeping it fast in the non-lazy case – but that’s a topic for a future blog post.

Step 3: Reading the metadata

Finally we were lucky in that we could reuse quite a bit of existing software: from the Posy project, another Rust-based pip install & resolve project, as well as Konstin’s awesome work on the pep440_rs and pep508_rs crates. The pep440_rs crate already implements the version matching for Python version strings and thus made it quite straightforward to plug the PyPI ecosystem into resolvo. We also added a high-performance wheel cache and the ability to actually unpack and install the wheel files.

We’ve built rip in a similar fashion as rattler - as low level libraries that we want to use in pixi! We’ve already begun the integration and are excited at all the other real-world use cases that this integration will unlock!

Combining this work results in a fast, typesafe PyPI package solver. Here is an example of rip solving the requirements for flask.

flask-install