Cover image for The New Rattler Resolver

The New Rattler Resolver

Wolf Vollprecht
Written by Wolf Vollprecht 9 months ago

At prefix.dev we're invested in the conda ecosystem. A key part of conda is the SAT solver. SAT stands for "Boolean Satisfiability". This class of solvers makes it possible to solve complicated dependency problems - for example if two packages require different ranges of a common dependency, the SAT solver will figure out the working solution (if it exists).

The size of the conda-forge repository makes this a pretty challenging task for regular SAT solvers. When mamba was started, the "key innovation" was to use a specialized solver for package management, called libsolv. Over time, and with the growth of the conda-forge ecosystem, the original solver for the conda package manager had become a severe bottleneck (solve times of ~minutes where commonly seen, with sometimes even reaching hours, far too long for enjoyable use).

Libsolv has offered us great performance, and it is also what we used in our low-level library for handling conda packages – rattler – that is using some custom bindings to the libsolv C library. However, one disadvantage of libsolv has always been that it is quite difficult to read – the code is very expert level C code that implements many low-level tricks for extra performance. Additionally, we had some trouble getting some PRs into libsolv. And lastly, the way libsolv is designed is completely not thread safe, which makes use cases with multi-threaded solving impossible.

For these reasons we made the difficult decision to try to create a port of libsolv in Rust. And luckily we are able to report that it has succeeded! Along the way we discovered some interesting theories about SAT solvers. For example, we found that libsolv implements many of the best practices for implementing high-performance SAT solvers, such as "conflict-driven clause learning" (CDCL) and 2-literal watching (a cool trick to make decisions very quickly).

Our port does not implement all features of libsolv (there are some features specific to RPM that we skipped, such as recommends, suggests, and obsoletes). It does implement everything needed for conda packages: dependencies (with full matchspec support), constrains (for optional dependencies), and most importantly beautiful error messages when no solution can be found.

You can find our port of libsolv (rattler_libsolv_rs) on Github!

SAT solving for package management

A SAT solver knows about "rules" and "literals". Each literal corresponds to one specific version of a package. Rules are the constraints that are imposed on the literals. For example, if package A requires package B, then there is a rule that says "if A is installed, then B must be installed".

The two main rule classes:

Each package name can only exist once: this means package A1 excludes all other versions of A2..N Package A requires package B: this means that if A1 is installed, then B1..N must be installed

Now the SAT solver takes all the rules and tries to find a solution. A regular SAT solver only looks for any solution to the problem – however, in the package management case we are also interested in finding a solution that maximizes package versions. For example, if we have a rule that says "A requires B", and we have two versions of A and two versions of B, then we want to find a solution that installs the highest version of A and the highest version of B.

The way we achieve this is by sorting the packages by version and testing the highest versions first. This is also the same "trick" that libsolv uses.

A key insight was that libsolv implements “conflict-driven clause learning” (CDCL), a state of the art technique for fast SAT solving. For some internal parts, porting was made easier by referencing the MiniSAT project. MiniSAT is a SAT solver with state-of-the-art performance that also implements CDCL. It aims to have a readable, understandable code base and comes with some pretty good explanatory papers.

We added loads of comments throughout the source code to help our future selves and other developers along.

CDCL Image description: The key parts of conflict driven clause learning: “learning” a conflicting clause that is always true (two packages that exclude each other) and backtracking to an earlier place in the decision tree. Image taken from Wikipedia.

Performance

After some performance optimizations we are happy to report that the Rust port of libsolv has excellent performance. Compared to the original libsolv C implementation, it sometimes wins and sometimes loses, putting it at least on par performance wise according to our benchmark set (using the conda-forge repository).

libsolv-clibsolv-rs
python=3.97.3734 ms4.5831 ms
xtensor, xsimd5.7521 ms2.7643 ms
tensorflow654.38 ms371.59 ms
quetz1.2577 s1.3807 s
tensorboard=2.1.1, grpc-cpp=1.39.1474.76 ms132.79 ms

Note

Note: Run cargo bench libsolv after checking out the rattler project to check the results on your own machine.

The results of both solvers are always valid. For both, libsolv and our port, the order of package traversal matters which is why one can sometimes encounter different, but valid results when comparing both implementations. The same holds true for conda. One can always further constrain the problem by adding additional specs.

Error Messages

The libsolv Rust port supports error messages which are heavily inspired by the mamba error messages (recommended read!). Our implementation supports a very similar style of error messages (with the additional benefit that it’s completely memory safe in Rust).

Error Messages

Future Work

The new libsolv-rs is already the default in our new pixi package manager. We are planning quite some future work for our libsolv port. First of all, we want to make it more generic, so that different packaging ecosystems can easily plug their own metadata. This would be modeled after the generic implementation of pubgrub-rs and similar libraries.

We also want to test & extend our implementation with different packaging ecosystems. The first one is the pip/python one. We are also trying to connect to the Fedora / openSUSE / DNF communities to see if they are open to collaborating on a RPM solving prototype.

Acknowledgements

Most of the work of bringing libsolv-rs to live was carried out by Adolfo Ochagavía. The work was partially funded by NumFOCUS in the form of a Small Development Grant under the “conda”-organisation, for which we are very grateful. Amazing performance optimisations and a ton of the base work in rattler (matchspecs, version ordering, …) was done by Bas Zalmstra.