:focal(smart))
Flickzeug – or why patching source code is hard
Flickzeug – or why patching source code is hard
TL;DR - we have extended the Rust crate diffy to be able to patch real-world source code with real-world patches, which is surprisingly hard! In the process we renamed it to flickzeug.
Patches are simple text files
I’ve seen many patch files in my life, but never looked at them with particular interest. Syntax highlighting usually showed me some red and green sections that told me more or less what was going to be replaced. I never noticed how fragile the entire format is! The basic patch text format is based on: a line is added when it starts with a +, removed when it starts with a - and kept as “context” if there is a leading whitespace character.
So, first of all, DO NOT REMOVE TRAILING WHITESPACE from patch files please! As you can already see, that would challenge the straightforward parser.
Also, I never thought much about the header format of patch files, nor realized that there are different ones.
A simple example
Let's say you have a file hello.c:
#include <stdio.h> int main() { printf("Hello, World!\n"); return 0; }
And you want to change the greeting. A unified diff patch for that looks like this:
--- a/hello.c +++ b/hello.c @@ -1,6 +1,6 @@ #include <stdio.h> int main() { - printf("Hello, World!\n"); + printf("Hello, Prefix!\n"); return 0; }
Looks straightforward, right? Every line starts with exactly one of three characters: a space ( ) for context, a - for removed lines, or a + for added lines. The @@ line is the "hunk header" — it tells you where in the file this change applies. In this case, @@ -1,6 +1,6 @@ means: starting at line 1, take 6 lines from the original, and replace them with 6 lines in the new version.
The trailing whitespace trap
Now look at the context lines more carefully. See the line between #include and int main()? That's an empty line in the original source. In the patch, it becomes a context line — which means it should start with a space character. So the "empty" line in the patch is actually a line containing a single space:
#include <stdio.h> · int main() {
(Where · represents a space character.)
If your linter strips trailing whitespace, that space disappears, and the line becomes truly empty — which is no longer a valid context line. The patch breaks. This definitely happens in the real world and is something we had to fix in flickzeug.
Different header formats
Not all patches look the same at the top. There are actually several header formats you'll encounter in the wild:
Unified diff (the most common, produced by git diff):
--- a/src/lib.rs +++ b/src/lib.rs @@ -10,7 +10,7 @@
Git extended headers add even more metadata:
diff --git a/src/lib.rs b/src/lib.rs index 1234567..abcdef0 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -10,7 +10,7 @@
In conda-forge recipes, you'll encounter all of these — patches come from upstream projects, from distro packagers, from random GitHub issues. They were generated by different tools, on different operating systems, sometimes edited by hand. A robust patch implementation needs to handle all of them – and figure out whether to strip a/ and b/ from the paths.
Older formats we do not support yet
For completness, there are two more diff formats that flickzeug does not yet support.
"Normal" diff (the oldest format) produced by diff :
3c3 < old line --- > new line
"Context" diff (the oldest format), produced by diff -c :
*** old.txt --- new.txt *************** *** 1,4 **** Hello, world! ! This is a test. Line three here. Goodbye. --- 1,5 ---- Hello, world! ! This is a new test. Line three here. + Added this line. Goodbye.
If there is popular demand, we are, of course, happy to add support for these diff formats. They should naturally map to our internal data structures, just the parser needs to be adjusted.
Hunks and fuzz: when patches don't quite fit
Patches are applied to "moving targets" - as the source code can change while the patch stays the same. This happens frequently in conda-forge.
To apply patches fuzzily, the tool looks for the context lines in the target file to figure out where to apply the change.
Consider this patch:
--- a/config.py +++ b/config.py @@ -15,7 +15,7 @@ # Database settings DB_HOST = "localhost" DB_PORT = 5432 -DB_NAME = "myapp" +DB_NAME = "myapp_production" DB_USER = "admin" DB_PASS = "secret"
The hunk header says this change is at line 15. But what if someone added 10 lines of imports at the top of the file? The context (# Database settings, DB_HOST, etc.) now lives at line 25. Flickzeug will search for the matching context and apply the change at the right offset. This is called offset matching.
But it can get worse. What if the surrounding context has also slightly changed? Maybe DB_PORT was already changed to 3306 by a previous patch. Now the context doesn't match exactly anymore. Tools like patch and git apply support a concept called fuzz — they can ignore a certain number of non-matching context lines at the edges of a hunk and still apply the patch.
In flickzeug we implemented a fuzzy matching algorithm
Line endings also suck (thanks Windows!)
Another fun edge case: line endings. A patch generated on Linux uses \n. The target file might use \r\n (Windows). Should the patching tool normalize line endings? What if the patch itself has mixed line endings because someone opened it in Notepad on Windows and saved it?
In conda-forge, this is not a theoretical problem. Packages are built across Linux, macOS, and Windows, often from the same set of patches. Getting line ending handling right is essential.
Rant about the patch format
As we've seen, even parsing diffs is pretty difficult. The current text based diffs are not particularly "machine readable" - in my opinion could be well replaced with a more structured data approach using JSON or TOML.
But, for conda-forge and in general we have to deal with the messy realities.
Why we built flickzeug
In rattler-build and the broader conda tooling at prefix.dev, we need to apply patches reliably, across platforms, for thousands of packages. Initially we relied on a mix of patch and git apply but found both to be lacking for a solid cross-platform experience. Line endings were particularly annoying.
The existing Rust crate diffy gave us a solid foundation for diffing and patching, but it struggled with many of the real-world edge cases described above:
Patches with missing or malformed headers
Fuzz matching when context has drifted
Mixed line endings
Patches that were hand-edited (and are slightly "wrong")
git-style extended headers and binary patch markers
So we extended diffy substantially, and in the process renamed it to flickzeug (German for "patching kit" — Flicken means to patch, Zeug means kit).
We tested flickzeug across hundreds of real-world conda-forge examples.
What flickzeug handles
Here's a taste of what a real-world patch looks like that flickzeug needs to handle gracefully. This is a slightly simplified version of an actual conda-forge patch:
diff --git a/CMakeLists.txt b/CMakeLists.txt index abc1234..def5678 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -42,8 +42,10 @@ project(mylib VERSION 2.1.0) option(BUILD_SHARED_LIBS "Build shared libraries" ON) option(BUILD_TESTS "Build test suite" OFF) -find_package(OpenSSL REQUIRED) -find_package(ZLIB REQUIRED) +if(NOT TARGET OpenSSL::SSL) + find_package(OpenSSL REQUIRED) +endif() +find_package(ZLIB REQUIRED HINTS ${ZLIB_ROOT}) if(UNIX AND NOT APPLE) set(CMAKE_INSTALL_RPATH "$ORIGIN/../lib")
This patch has a git extended header, modifies a CMake file (where indentation matters), wraps an existing call in a conditional, and adjusts a find_package call. It was probably written for a specific upstream version but needs to apply cleanly even when the project has moved on a few commits.
flickzeug applies patches using a multi-level fuzzy matching strategy inspired by GNU patch. First, it tries an exact match at the hunk's expected line number, then searches outward alternating backward and forward. If that fails, it increases the "fuzz level" (up to a configurable max of 2), which allows ignoring context lines from the edges of the hunk — just like GNU patch, where fuzz N means up to N context lines can be dropped from the start and end. For non-ignored lines, it uses Levenshtein distance (via strsim) to accept lines that are at least 80% similar, with optional case and whitespace insensitivity. When a fuzzy match is found, the original file's context lines are preserved as-is rather than replaced with the patch's version, so only the actual insertions and deletions are applied — keeping the surrounding code faithful to the target file.
flickzeug is designed to handle exactly this kind of messy, real-world patching — the kind that the scientific and packaging ecosystems throw at you every day.
flickzeug is open source (dual licensed under Apache-2 and MIT) and available on crates.io. We'd love your feedback and contributions!
Credits
Flickzeug is based on diffy by Brandon Williams. The initial fork and a lot of robustness testing and improvements in flickzeug were done by Valentin Kharin.