Cover image for Introducing Py-Rattler

Introducing Py-Rattler

Tarun Pratap Singh
Written by Tarun Pratap Singh 6 months ago

Introduction

We are very happy to announce the 0.1 release of py-rattler. A companion library to the rattler rust library to allow you to create and manage conda environments. As this is a companion library to rattler, all the functionality provided in rattler will also be available via py-rattler, along with additional conveniences.

What Does Py-Rattler Bring To The Table?

Py-Rattler gives you access to the complete rattler library, but one of the things that's really exciting is Resolvo, the all new fast generic package resolver written in rust. There already seems to be some effort from the community to utilize this solver with the wider conda ecosystem.

How Do You Use Py-Rattler?

Py-Rattler is available via both pypi and conda. Any package manager which supports them can be used to install and utilize them.

Pixi

pixi add py-rattler

Pip

pip install py-rattler

Conda

conda install py-rattler

Mamba

mamba install py-rattler -c conda-forge

Example

It takes under 8 seconds to create a brand-new environment without any kind of cache. Pretty cool, isn't it?

Let's go through the code of this simple example to see how we'll utilize the library and what you can do with the library.

from asyncio import run
from rattler import fetch_repo_data, solve, link
from rattler import Channel, Platform, MatchSpec, VirtualPackage

def download_callback(done, total):
    print(end = "\r")
    print(f'{done/1024/1024:.2f}MiB/{total/1024/1024:.2f}MiB', end = "\r")
    if done == total:
        print()

async def main():
    channel = Channel("conda-forge")

    deps = [
        "python ~=3.12.0",
        "pip",
        "requests 2.31.0",
    ]
    print(f"Creating an environment with: {deps}")

    match_specs = [ MatchSpec(dep) for dep in deps ]

    platforms = [Platform.current(), Platform("noarch")]

    virtual_packages = [p.into_generic() for p in VirtualPackage.current()]

    cache_path = "/tmp/py-rattler-cache/"
    env_path = "/tmp/env-path/env"

    print("Started fetching repo_data")
    repo_data = await fetch_repo_data(
        channels = [channel],
        platforms = platforms,
        cache_path = f"{cache_path}/repodata",
        callback = download_callback,
    )
    print("Finished fetching repo_data")

    solved_dependencies = solve(
        specs = match_specs,
        available_packages = repo_data,
        virtual_packages = virtual_packages,
    )
    print("Solved required dependencies")

    await link(
        dependencies = solved_dependencies,
        target_prefix = env_path,
        cache_dir = f"{cache_path}/pkgs",
    )
    print(f"Created environment: {env_path}")

if __name__ == "__main__":
    run(main())

This little example creates a new environment and installs python, pip & the requests library. Let's go through this example to see how it is working.

from asyncio import run
from rattler import fetch_repo_data, solve, link
from rattler import Channel, Platform, MatchSpec, VirtualPackage

We start by importing the run function from asyncio python module, this is because parts of the py-rattler library are async, and we need an easy way to run them. We then import some convenience functions which will do most of the work for us and some types which we will require to make these functions work.

def download_callback(done, total):
    print(end = "\r")
    print(f'{done/1024/1024:.2f}MiB/{total/1024/1024:.2f}MiB', end = "\r")
    if done == total:
        print()

We then define a simple callback function for the purposes of reporting the download progress of the repodata. The callback will accept two integers as params, downloaded bytes and total bytes.

async def main():
    ...

We have finally started with our main function where we will be doing everything for now.

channel = Channel("conda-forge")

We start by defining the conda channels from which we will be getting our dependencies. We will be using conda-forge channel from https://anaconda.org. If you would like to use channels from anywhere else, you can do so easily by passing a ChannelConfig as second parameter to Channel.

deps = [
    "python ~=3.12.0",
    "pip",
    "requests 2.31.0",
]
print(f"Creating an environment with: {deps}")

match_specs = [ MatchSpec(dep) for dep in deps ]

We then declare a list of dependencies which we are going to install in our environment and creating MatchSpec with them. There is quite some documentation available for the MatchSpec to better understand them.

platforms = [Platform.current(), Platform("noarch")]

Now, we create a list of platforms for which we are going to be getting our dependencies. You can manually create a Platform by passing in a platform string or by the current method.

virtual_packages = [p.into_generic() for p in VirtualPackage.current()]

We then create a list of GenericVirtualPackage which are available on the current platform and might be required by some dependencies. You can also create these manually with PackageName, Version and a build string.

cache_path = "/tmp/py-rattler-cache/"
env_path = "/tmp/env-path/env"

print("Started fetching repo_data")
repo_data = await fetch_repo_data(
    channels = [channel],
    platforms = platforms,
    cache_path = f"{cache_path}/repodata",
    callback = download_callback,
)
print("Finished fetching repo_data")

We are now declaring the paths for our cache and environment. This can be anything that you want, we create it here in the tmp directory on a unix machine, and then we call the fetch_repo_data convenience function we imported earlier. Note, the await while calling the function, this is an async function, which takes in keyword arguments. This will download the repodata for all the platforms and channels. You can pass in optionally pass in a progress callback function. Here, we are passing in the download callback we created earlier.

solved_dependencies = solve(
    specs = match_specs,
    available_packages = repo_data,
    virtual_packages = virtual_packages,
)
print("Solved required dependencies")

Now that we have the repodata, we can solve our dependencies via the solve convenience function. This function takes in all of our MatchSpec, RepoData to solve all the packages that we will need to create the environment. It can also optionally take in locked_packages, pinned_packages, virtual_packages and strict_channel_priority arguments to further solve the environment in a better fashion.

await link(
    dependencies = solved_dependencies,
    target_prefix = env_path,
    cache_dir = f"{cache_path}/pkgs",
)
print(f"Created environment: {env_path}")

Then, we will take this list of solved dependencies and create an environment from it with the link convenience function. It will download all the packages in the cache_dir and create an environment in the target_prefix directory. This is an async function, hence needs to be awaited.

if __name__ == "__main__":
    run(main())

And, we are done with our little program to create an environment, now we just execute our main function whenever this script is run. That was all it took to create a little environment with py-rattler.

The Future

Py-Rattler is an ongoing effort, and this is just the first release. We are looking to add a lot more functionality in the upcoming releases along with making existing functionality even more robust.

Some of the features that are in the works include but aren't limited to:

  • More fine-grained control over existing types
  • Lockfiles
  • Better progress reporting
  • ..and a lot more

Contributing

Py-Rattler is 100% open-source, we would love to see you contribute. You can check out the issue tracker on github. You can join our discord server for questions, requests or a casual chat.