The New Rattler Resolver
At prefix.dev we're invested in the conda ecosystem. A key part of conda is the SAT solver. SAT stands for "Boolean Satisfiability". This class of solvers makes it possible to solve complicated dependency problems - for example if two packages require different ranges of a common dependency, the SAT solver will figure out the working solution (if it exists).
The size of the conda-forge repository makes this a pretty challenging task for regular SAT solvers. When mamba was started, the "key innovation" was to use a specialized solver for package management, called
Over time, and with the growth of the conda-forge ecosystem, the original solver for the conda package manager had become a severe bottleneck (solve times of ~minutes where commonly seen, with sometimes even reaching hours, far too long for enjoyable use).
Libsolv has offered us great performance, and it is also what we used in our low-level library for handling conda packages – rattler – that is using some custom bindings to the libsolv C library. However, one disadvantage of libsolv has always been that it is quite difficult to read – the code is very expert level C code that implements many low-level tricks for extra performance. Additionally, we had some trouble getting some PRs into libsolv. And lastly, the way libsolv is designed is completely not thread safe, which makes use cases with multi-threaded solving impossible.
For these reasons we made the difficult decision to try to create a port of libsolv in Rust. And luckily we are able to report that it has succeeded! Along the way we discovered some interesting theories about SAT solvers. For example, we found that libsolv implements many of the best practices for implementing high-performance SAT solvers, such as "conflict-driven clause learning" (CDCL) and 2-literal watching (a cool trick to make decisions very quickly).
Our port does not implement all features of libsolv (there are some features specific to RPM that we skipped, such as recommends, suggests, and obsoletes). It does implement everything needed for conda packages: dependencies (with full matchspec support), constrains (for optional dependencies), and most importantly beautiful error messages when no solution can be found.
You can find our port of libsolv (rattler_libsolv_rs) on Github!
A SAT solver knows about "rules" and "literals". Each literal corresponds to one specific version of a package. Rules are the constraints that are imposed on the literals. For example, if package A requires package B, then there is a rule that says "if A is installed, then B must be installed".
The two main rule classes:
Each package name can only exist once: this means package
A1 excludes all other versions of
Package A requires package B: this means that if
A1 is installed, then
B1..N must be installed
Now the SAT solver takes all the rules and tries to find a solution. A regular SAT solver only looks for any solution to the problem – however, in the package management case we are also interested in finding a solution that maximizes package versions. For example, if we have a rule that says "A requires B", and we have two versions of A and two versions of B, then we want to find a solution that installs the highest version of A and the highest version of B.
The way we achieve this is by sorting the packages by version and testing the highest versions first. This is also the same "trick" that libsolv uses.
A key insight was that libsolv implements “conflict-driven clause learning” (CDCL), a state of the art technique for fast SAT solving.
For some internal parts, porting was made easier by referencing the
MiniSAT is a SAT solver with state-of-the-art performance that also implements CDCL.
It aims to have a readable, understandable code base and comes with some pretty good explanatory papers.
We added loads of comments throughout the source code to help our future selves and other developers along.
Image description: The key parts of conflict driven clause learning: “learning” a conflicting clause that is always true (two packages that exclude each other) and backtracking to an earlier place in the decision tree. Image taken from Wikipedia.
After some performance optimizations we are happy to report that the Rust port of libsolv has excellent performance. Compared to the original libsolv C implementation, it sometimes wins and sometimes loses, putting it at least on par performance wise according to our benchmark set (using the conda-forge repository).
|python=3.9||7.3734 ms||4.5831 ms|
|xtensor, xsimd||5.7521 ms||2.7643 ms|
|tensorflow||654.38 ms||371.59 ms|
|quetz||1.2577 s||1.3807 s|
|tensorboard=2.1.1, grpc-cpp=1.39.1||474.76 ms||132.79 ms|
cargo bench libsolv after checking out the
rattler project to check the results on your own machine.
The results of both solvers are always valid. For both, libsolv and our port, the order of package traversal matters which is why one can sometimes encounter different, but valid results when comparing both implementations. The same holds true for conda. One can always further constrain the problem by adding additional specs.
The libsolv Rust port supports error messages which are heavily inspired by the mamba error messages (recommended read!). Our implementation supports a very similar style of error messages (with the additional benefit that it’s completely memory safe in Rust).
The new libsolv-rs is already the default in our new pixi package manager. We are planning quite some future work for our libsolv port. First of all, we want to make it more generic, so that different packaging ecosystems can easily plug their own metadata. This would be modeled after the generic implementation of pubgrub-rs and similar libraries.
We also want to test & extend our implementation with different packaging ecosystems. The first one is the pip/python one. We are also trying to connect to the Fedora / openSUSE / DNF communities to see if they are open to collaborating on a RPM solving prototype.
Most of the work of bringing libsolv-rs to live was carried out by Adolfo Ochagavía. The work was partially funded by NumFOCUS in the form of a Small Development Grant under the “conda”-organisation, for which we are very grateful. Amazing performance optimisations and a ton of the base work in rattler (matchspecs, version ordering, …) was done by Bas Zalmstra.