WASM + Conda: Revolutionizing Scientific Computing in the Browser
A QuantStack / prefix.dev collaboration has recently revamped the emscripten-forge effort -- a custom distribution of packages compiled to WebAssembly. These are regular conda packages that run anywhere Emscripten / WebAssembly can be executed, such as with nodejs
or directly in the browser. But why does this matter? Let's dive into the details.
The Big Picture: Scientific Computing in the Browser
Imagine running complex scientific computations or data analyses directly in your web browser, with no need for server-side processing or complex installations. That's the goal of emscripten-forge. We're breaking down the barriers between high-performance scientific computing and web-based applications in order to make powerful tools accessible to a broader audience.
What is WASM & Emscripten and Why Do We Need It?
WASM is a low-level bytecode format that runs in web browsers. It's designed to be fast, secure, and portable, making it ideal for running high-performance applications in the browser. It's like an "assembly" language (ie. ARM64 or x86) but for the web, and additionally, it's very well sandboxed and secure. It is also supported by all major browsers, making it a universal target for web applications.
Emscripten is a toolchain that compiles C and C++ code to WebAssembly (WASM). Many scientific computing libraries are written in these languages for performance reasons. By compiling them to WASM, we can run these high-performance libraries directly in web browsers – at high execution speed and with no local installation needed.
The Emscripten-Forge Project: Building the Foundation
Our aim is to prepare a large collection of low-level libraries for building custom WASM applications. We're focusing on foundational libraries such as:
- Compression libraries (
zstd
,lz
,zip
) - XML parsers (
expat
,xml2
) - Graphics libraries (
cairo
,pixman
,libpng
) - Interpreters (
lua
,python
, and others)
These libraries are the building blocks for complex, high-performance applications that can run in web browsers. They enable developers to create powerful web-based tools for scientific computing, data analysis, and more. For instance, a data scientist could use these libraries to build an interactive data visualization tool that runs entirely in the browser, or a researcher could create a simulation that can be easily shared and run by colleagues without any installation.
Why Not Just Use Pyodide?
You might ask: isn't this just "pyodide"? And in a way, that's correct. The emscripten-forge project has taken a lot of inspiration from pyiodide, especially for the Python build. However, we have two key differentiators:
- Language Agnostic: We're not limiting ourselves to Python. Our goal is to make this approach work for multiple programming languages, just like conda. We envision a future where you could run R or Ruby code in your browser, opening up new possibilities for data analysis and web development.
- Conda-forge Integration: We aim to make WebAssembly a "normal" target for conda-forge. This means that the vast ecosystem of packages available on conda-forge could automatically build WASM binaries. Imagine having access to thousands of scientific packages, all runnable in your browser!
Recent Upgrades: Making It All Work
While we've had this infrastructure in place for a while, we recently rolled out major upgrades to streamline the process:
- Building WASM packages
- Setting up the WASM compilation environment
- Using the WASM packages
Let's dive into each of these and see how they fit together.
Building WASM Packages: Enter Rattler-build
QuantStack recently moved emscripten-forge
from conda-build / boa
to rattler-build
, a modern alternative written in Rust. This shift is more than just a change in tools; it's a significant upgrade in the build process.
rattler-build
is crucial because it:
- Natively supports
emscripten-wasm32
andwasi-wasm32
platforms - Distinguishes between "build" & "run" dependencies at test time, making it straightforward to test WASM packages
- Uses a cleaner, community-approved recipe format (the conda-v2-recipe spec!)
These features allow us to more efficiently and reliably build conda packages for WebAssembly. For example, the distinction between "build" and "run" dependencies means we can easily install an emulator in the build dependencies and run non-native tests for WASM packages. This ensures that the packages we're building will actually work in a browser environment.
The new recipe format is not just cleaner; it's more intuitive for contributors. This could lower the barrier for community contributions, potentially accelerating the growth of available WASM packages.
Setting Up the WASM Compilation Environment: Pixi to the Rescue
We've adopted pixi and pixi tasks to simplify the emscripten-forge workflow. Here's why this is a game-changer:
- Easy setup: Just install pixi, clone our repository, and you're ready to go. The days of complex environment setup are over.
- Simplified compilation: A
pixi.toml
file in the project root provides handy tasks for building packages locally or in CI. This standardizes the build process, reducing errors and inconsistencies. - Efficient caching: Tasks benefit from caching, so you only build things once. This can significantly speed up the development and testing process.
Here's how simple it is to compile a recipe to WASM:
- Install pixi:
curl -sSL https://pixi.sh/install.sh | bash
- Clone the emscripten-forge recipe repository:
git clone https://github.com/emscripten-forge/recipes
- Run:
pixi run build-emscripten-wasm32-pkg ./recipes/recipes_emscripten/qhull
This single command does a lot behind the scenes: it installs multiple environments, gets the correct version of the emscripten compiler and rattler-build, and executes the build. Thanks to task-caching, you'll only have to build the emscripten-compiler package once, saving time on subsequent builds.
Using the WASM Packages: JupyterLite Integration
Here's where it all comes together. We can now create complete JupyterLite instances with pre-bundled WASM packages from a single pixi.toml file. This means:
- You can run Jupyter notebooks entirely in your browser
- All the scientific computing libraries are pre-loaded
- You have full control over what users receive in the browser
The pixi.toml file creates two environments:
- One with dependencies that run on your machine (bundler, server for
jupyterlite
) - Another with all the WASM dependencies, which
jupyterlite
compiles and ships to the browser as a single blob
When you run pixi run serve
, it executes three tasks:
- It uses pixi to install all WASM dependencies as an environment. The pixi.lock file acts as a cache indicator, so this step is skipped if the dependencies are already installed.
- It runs
jupyterlite build
, creating an index.html file with everything needed in the_output
folder. This step is cached based on the_output
folder's existence and the pixi.lock file's status. - Finally, it runs the
jupyterlite serve
function, spinning up a simple HTTP server to render JupyterLite.
This creates a seamless environment for browser-based scientific computing and data analysis, with the added benefit of easy sharing and reproducibility.
Deploy this on Github pages
We also prepared an example of how to deploy your JupyterLite instance to Github pages! This is a super-convenient way of sharing your work with others. You can find the example here and the deployed version here.
The example contains a Github action that builds and bundles the JupyterLite application. You can very easily adapt this to your own needs.
The Road Ahead
Our experiments have shown that it's possible to build and use WebAssembly packages just like "normal" packages. This opens up exciting possibilities, especially in computer science education. Imagine students being able to run complex simulations or data analyses directly in their browsers, without needing to set up local environments or rely on university servers.
We're looking forward to collaborating with other ecosystems, companies, and institutions to make emscripten-forge the go-to place for WASM packages. Whether you're a data scientist looking to share interactive analyses, an educator wanting to create engaging online courses, or a developer aiming to push the boundaries of web applications, there's potential here to transform how we do computing in the browser.
Get Involved
We welcome contributions in various areas:
- Porting new libraries to WASM for emscripten-forge
- Improving the build processes and help us support the latest and greatest emscripten versions
- Creating example projects and use cases
Please join our Discord server to ask questions and share feedback!.