Cover image for Introducing Rattler: Conda from Rust

Introducing Rattler: Conda from Rust

Bas Zalmstra
Written by Bas Zalmstra a year ago

Introduction

Packaging is an essential part of software development, and conda packages are an ideal way to distribute cross-platform binaries across multiple fields, from Robotics and High-Performance Computing to Data Science, machine learning, and university research.Thanks to initiatives like conda-forge and the (micro)mamba package manager, this is backed by a vast open-source community.

At prefix.dev packaging is at the core of our business. We help our customers get the most out of the conda ecosystem by providing them with the help they need to integrate reliable package management into their workflow, introducing new features to mamba and micromamba, and trying out ideas like creating cacheable docker images from conda lock files.

We also provide a public service at prefix.dev where we currently host a very fast index of conda packages from different channels. We provide a service to solve and host conda environment files, and we plan on extending this offering with many more awesome features in the near future. Our goal with these public services is to improve the ecosystem in general, so it's easier for people to work with conda.

To be able to do this, we use the Rust programming language. Everything at prefix.dev uses Rust, except for the frontend which is written in Typescript. However, the frontend is powered by a powerful GraphQL API which in turn is built in Rust.

Using Conda from Rust

We frequently interact with conda in our Rust code:

  • we want to parse, extract, analyze and validate conda packages for our package index,
  • we want to parse and sort versions to display a sorted list of package variants,
  • we want to parse and evaluate "matchspecs" when you navigate for instance to https://prefix.dev/matchspec/python >=3.9?channel=conda-forge,
  • we want to generate efficient cacheable docker images from lock files which require the ability to "link" or install environments,
  • we want to quickly solve conda environments in the cloud

While several libraries provide some of these functionalities, most of them are written in Python, which isn't easily usable from Rust. To solve environments in the cloud, we currently install a conda environment with conda-lock on our runners and run the same tool as a standalone program within the environment.

The mamba-org GitHub organization does provide libmamba, a C++ library with an API for mamba. But the API of the library wasn't great and would be a challenge to integrate neatly with Rust. It also lacked proper data types for essential files in conda archives (about.json, index.json), parsing of the repodata.json format, matchspec parsing and evaluation, and version ordering. This isn’t an issue for mamba as most of it is delegated to libsolv. But for our use cases, we need access to these things.

Note

Some of these shortcomings are also being addressed by the amazing team at QuantStack. They are revising mamba to make it easier to incorporate into other projects, thus allowing for a more seamless integration.

Besides our work on prefix.dev, we would also like to help further along the conda tooling situation. For instance, we’d like to experiment integrate other Rust projects, like pubgrub, a solver based on the PubGrub algorithm, which could provide an interesting alternative to libsolv. We were also keen on optimizing a lot of the code by making more use of multicore processors and using async where appropriate.

But integrating all of these things in a mixed C++/Rust project seemed infeasible.

Enter Rattler!

To improve the current state of affairs, we embarked on a journey to build rattler - an open-source Rust library for working with the conda ecosystem. The goal of rattler is to enable other programs and libraries to easily interact with the conda ecosystem without being dependent on Python.

Rattler standardizes data types by providing strongly typed native Rust structs for them as well as serialization and deserialization. It also implements some of the "standards" employed by both the mamba and conda tools like how packages are cached. All functionality is exposed with lean thread-safe APIs with the minimal amount of input required (no context or global variables). We believe that this project has the potential to revolutionize the way developers interact with the conda ecosystem, and we are excited to see the possibilities it could bring.

Rattler offers clean, compartmentalized building blocks for package management in the Conda ecosystem. Various crates tackle different issues:

  • rattler_conda_types: A crate that provides data types, serialization, and implementations for commonly used datatypes: Version, MatchSpec, RepoData, Channel, etc. etc.
  • rattler_package_streaming: A crate to download, extract, validate, and cache conda package archives.
  • rattler_repodata_gateway: A crate to download, cache and interact with repodata.json files.
  • rattler_solver: Provides an initial interface to solve environments with different solver implementations. (Currently only via libsolv) (thanks @aochagavia and @tdejager!)
  • rattler_virtual_packages: A crate to efficiently detect virtual packages available on the current system.
  • rattler_shell: A crate to generate activation and deactivation scripts for different shells.

Rattler is already a versatile library that is able to do much more than just the above. For instance, it can also install an environment from an explicit lock file! For a complete overview, I recommend checking out the README on the GitHub repository or jump straight to the documentation of all the crates.

Demonstration

To give you an idea of the current state of affairs, here is an example of a small experimental command-line tool to create an environment from scratch using the crates mentioned above.

The experimental command cargo run --release create cowpy is equivalent to micromamba create -p ./prefix cowpy except that it always updates the environment to only contain exactly the specs passed to the command (and their dependencies). micromamba run is used to demonstrate that you can run a binary from the created environment.

The above command downloaded and installed a complete python environment as well!

Note, however, that our goal is not to replace mamba or conda with rattler. Rather, we seek to provide the tools necessary to facilitate interaction between software, such as workflow tools, IDEs, websites, and the existing conda ecosystem.

Future plans

We are also not sitting still. We want to extend Rattler to:

  • Efficiently solve environments for multiple platforms for our online environment solver.
  • Generate conda-lock files.
  • Install conda environments on our cloud backend.
  • Experiment with different solver implementations.
  • Experiment with different repodata formats.
  • Support optional dependencies.
  • Experiment with a sparse index.
  • Experiment with using conda packages for doing development using Rust
  • Experiment with building conda packages
  • ..and so much more!

We hope that more people will see the benefit of having a set of self-contained libraries to interact with conda outside Python. But most of all, we hope more people will start contributing to rattler. We are but a small company working on this project, but hopefully with contributions from others, we can make Rattler into something that developers will love to use and integrate!

We are proud that rattler recently joined the mamba-org organization. It is an honor to be included next to the fantastic mamba package manager.

If you are new to Rust or conda or are just curious, you might want to check out our list of "good first issues". You can also find us on Discord or via email! There is also the bi-weekly mamba-org call that we attend. We are also looking for people to join our team! Long story short: reach out to us; we are eager to discuss all things related to rattler, conda, and mamba!