Cover image for WASM + Conda: Revolutionizing Scientific Computing in the Browser

WASM + Conda: Revolutionizing Scientific Computing in the Browser

Wolf Vollprecht
Written by Wolf Vollprecht 2 months ago

A QuantStack / prefix.dev collaboration has recently revamped the emscripten-forge effort -- a custom distribution of packages compiled to WebAssembly. These are regular conda packages that run anywhere Emscripten / WebAssembly can be executed, such as with nodejs or directly in the browser. But why does this matter? Let's dive into the details.

The Big Picture: Scientific Computing in the Browser

Imagine running complex scientific computations or data analyses directly in your web browser, with no need for server-side processing or complex installations. That's the goal of emscripten-forge. We're breaking down the barriers between high-performance scientific computing and web-based applications in order to make powerful tools accessible to a broader audience.

What is WASM & Emscripten and Why Do We Need It?

WASM is a low-level bytecode format that runs in web browsers. It's designed to be fast, secure, and portable, making it ideal for running high-performance applications in the browser. It's like an "assembly" language (ie. ARM64 or x86) but for the web, and additionally, it's very well sandboxed and secure. It is also supported by all major browsers, making it a universal target for web applications.

Emscripten is a toolchain that compiles C and C++ code to WebAssembly (WASM). Many scientific computing libraries are written in these languages for performance reasons. By compiling them to WASM, we can run these high-performance libraries directly in web browsers – at high execution speed and with no local installation needed.

The Emscripten-Forge Project: Building the Foundation

Our aim is to prepare a large collection of low-level libraries for building custom WASM applications. We're focusing on foundational libraries such as:

  • Compression libraries (zstd, lz, zip)
  • XML parsers (expat, xml2)
  • Graphics libraries (cairo, pixman, libpng)
  • Interpreters (lua, python, and others)

These libraries are the building blocks for complex, high-performance applications that can run in web browsers. They enable developers to create powerful web-based tools for scientific computing, data analysis, and more. For instance, a data scientist could use these libraries to build an interactive data visualization tool that runs entirely in the browser, or a researcher could create a simulation that can be easily shared and run by colleagues without any installation.

Why Not Just Use Pyodide?

You might ask: isn't this just "pyodide"? And in a way, that's correct. The emscripten-forge project has taken a lot of inspiration from pyiodide, especially for the Python build. However, we have two key differentiators:

  1. Language Agnostic: We're not limiting ourselves to Python. Our goal is to make this approach work for multiple programming languages, just like conda. We envision a future where you could run R or Ruby code in your browser, opening up new possibilities for data analysis and web development.
  2. Conda-forge Integration: We aim to make WebAssembly a "normal" target for conda-forge. This means that the vast ecosystem of packages available on conda-forge could automatically build WASM binaries. Imagine having access to thousands of scientific packages, all runnable in your browser!

Recent Upgrades: Making It All Work

While we've had this infrastructure in place for a while, we recently rolled out major upgrades to streamline the process:

  1. Building WASM packages
  2. Setting up the WASM compilation environment
  3. Using the WASM packages

Let's dive into each of these and see how they fit together.

Building WASM Packages: Enter Rattler-build

QuantStack recently moved emscripten-forge from conda-build / boa to rattler-build, a modern alternative written in Rust. This shift is more than just a change in tools; it's a significant upgrade in the build process.

rattler-build is crucial because it:

  • Natively supports emscripten-wasm32 and wasi-wasm32 platforms
  • Distinguishes between "build" & "run" dependencies at test time, making it straightforward to test WASM packages
  • Uses a cleaner, community-approved recipe format (the conda-v2-recipe spec!)

These features allow us to more efficiently and reliably build conda packages for WebAssembly. For example, the distinction between "build" and "run" dependencies means we can easily install an emulator in the build dependencies and run non-native tests for WASM packages. This ensures that the packages we're building will actually work in a browser environment.

The new recipe format is not just cleaner; it's more intuitive for contributors. This could lower the barrier for community contributions, potentially accelerating the growth of available WASM packages.

Setting Up the WASM Compilation Environment: Pixi to the Rescue

We've adopted pixi and pixi tasks to simplify the emscripten-forge workflow. Here's why this is a game-changer:

  1. Easy setup: Just install pixi, clone our repository, and you're ready to go. The days of complex environment setup are over.
  2. Simplified compilation: A pixi.toml file in the project root provides handy tasks for building packages locally or in CI. This standardizes the build process, reducing errors and inconsistencies.
  3. Efficient caching: Tasks benefit from caching, so you only build things once. This can significantly speed up the development and testing process.

Here's how simple it is to compile a recipe to WASM:

  1. Install pixi: curl -sSL https://pixi.sh/install.sh | bash
  2. Clone the emscripten-forge recipe repository: git clone https://github.com/emscripten-forge/recipes
  3. Run: pixi run build-emscripten-wasm32-pkg ./recipes/recipes_emscripten/qhull

This single command does a lot behind the scenes: it installs multiple environments, gets the correct version of the emscripten compiler and rattler-build, and executes the build. Thanks to task-caching, you'll only have to build the emscripten-compiler package once, saving time on subsequent builds.

Using the WASM Packages: JupyterLite Integration

Screenshot of the pixi.toml file

Link to example

Here's where it all comes together. We can now create complete JupyterLite instances with pre-bundled WASM packages from a single pixi.toml file. This means:

  1. You can run Jupyter notebooks entirely in your browser
  2. All the scientific computing libraries are pre-loaded
  3. You have full control over what users receive in the browser

The pixi.toml file creates two environments:

  1. One with dependencies that run on your machine (bundler, server for jupyterlite)
  2. Another with all the WASM dependencies, which jupyterlite compiles and ships to the browser as a single blob

When you run pixi run serve, it executes three tasks:

  1. It uses pixi to install all WASM dependencies as an environment. The pixi.lock file acts as a cache indicator, so this step is skipped if the dependencies are already installed.
  2. It runs jupyterlite build, creating an index.html file with everything needed in the _output folder. This step is cached based on the _output folder's existence and the pixi.lock file's status.
  3. Finally, it runs the jupyterlite serve function, spinning up a simple HTTP server to render JupyterLite.

This creates a seamless environment for browser-based scientific computing and data analysis, with the added benefit of easy sharing and reproducibility.

Deploy this on Github pages

We also prepared an example of how to deploy your JupyterLite instance to Github pages! This is a super-convenient way of sharing your work with others. You can find the example here and the deployed version here.

The example contains a Github action that builds and bundles the JupyterLite application. You can very easily adapt this to your own needs.

The Road Ahead

Our experiments have shown that it's possible to build and use WebAssembly packages just like "normal" packages. This opens up exciting possibilities, especially in computer science education. Imagine students being able to run complex simulations or data analyses directly in their browsers, without needing to set up local environments or rely on university servers.

We're looking forward to collaborating with other ecosystems, companies, and institutions to make emscripten-forge the go-to place for WASM packages. Whether you're a data scientist looking to share interactive analyses, an educator wanting to create engaging online courses, or a developer aiming to push the boundaries of web applications, there's potential here to transform how we do computing in the browser.

Get Involved

We welcome contributions in various areas:

  • Porting new libraries to WASM for emscripten-forge
  • Improving the build processes and help us support the latest and greatest emscripten versions
  • Creating example projects and use cases

Please join our Discord server to ask questions and share feedback!.

Resources and Further Reading

  • Emscripten-Forge Github Repository: Link
  • Emscripten-Forge Website: Link
  • Pixi example: Link
  • Pixi WASM deployment example: Link and the deployed version: Link
  • Rattler-build Documentation: Link
  • Pixi Documentation: Link