Binary fuse filters use less memory and build faster than any previous probabilistic filter — getting within 13% of the theoretical minimum while keeping blazing query speed.
Discover How ↓Imagine you're a librarian at a library with billions of books. Every minute, someone walks up and asks: "Do you carry this book?" Checking the full catalog takes time. But what if you had a magic cheat sheet — a tiny note card that could almost always tell you "definitely no" or "probably yes" in a split second?
In computing, this "cheat sheet" is called a probabilistic filter — a data structure that gives instant answers about whether an item is in a set, using far less memory than storing the actual set. The tradeoff: it sometimes says "yes" when the answer is really "no" (a false positive), but it never says "no" when the answer is "yes".
Engineers use these filters everywhere: databases avoid expensive disk reads, networks skip unnecessary data transfers, web browsers check URLs against safe-browsing lists. The better the filter, the less memory wasted and the fewer false alarms.
There's a mathematical limit on how small any filter can be. To guarantee a false-positive rate of ε, you need at least \(\log_2(1/\epsilon)\) bits per stored key. That's the floor — you can't go lower. Everything else is overhead.
Binary fuse filters get closer to the theoretical minimum than anything before them — while building faster and querying just as quickly.
The classic. Think of it like a big grid of light switches, all initially OFF. For each item you want to remember, you flip k specific switches to ON (chosen by hash functions). To check an item, you look at its k switches — if they're all ON, the item might be there. If any is OFF, it's definitely not.
Drawback: The more accurate you want it, the more switches you need to check per query. For a 1% false-positive rate, that's 7 switches. For 0.01%, it's 14. Each "switch check" is a random memory access — and those are slow. Plus, Bloom filters waste 44% more space than the theoretical minimum.
Variant — Blocked Bloom: Packs all switches for one item into one cache-friendly block. Super fast queries, but uses even more space (~30% more than standard Bloom).
Instead of flipping switches, cuckoo filters store fingerprints — short codes that represent each key. It's like giving each person a tiny ID badge number, and storing those badges in a clever two-slot hash table.
Only two memory lookups per query (vs. up to 14 for Bloom), and they're 30–40% above the theoretical minimum for 12-bit and 16-bit configurations. They also let you delete items — something Bloom filters can't do easily.
The direct predecessor of binary fuse filters. Also fingerprint-based, but uses a clever XOR trick: each key maps to three array locations, and XOR-ing those three values gives the key's fingerprint. This means exactly 3 lookups per query, regardless of how low the false-positive rate is.
Xor filters are 23% above the minimum — better than Bloom or cuckoo. But their construction was slow (2× slower than alternatives) and they divided the array into only 3 big segments, limiting memory locality.
Also fingerprint-based but use Gaussian elimination (a linear algebra technique) to solve for fingerprint values. Very configurable — you can trade construction time for space efficiency. Good construction speed, but query time is slower than xor or fuse filters.
The technical name for this change is spatial locality. Instead of dividing the fingerprint array into 3 large segments (as xor filters do), binary fuse filters divide it into hundreds of small, non-overlapping segments. Each key maps to three locations within three consecutive segments.
The segment sizes are constrained to be powers of two (2, 4, 8, 16, ...). This isn't just a nice number — it means the computer can compute hash positions using ultra-fast bitwise operations instead of slow division. The name "binary" comes from this power-of-two requirement.
The term "fuse" comes from theoretical work by Dietzfelbinger and Walzer on fuse graphs. Think of a fuse as a chain where removing one link triggers a cascade — similar to how the filter's construction "peels" away keys one by one.
In the 3-wise version, each key maps to 3 array locations (just like xor filters). For large sets, the array only needs 1.125× n entries instead of 1.23× n — saving ~10% immediately.
The 4-wise version maps each key to 4 locations. The array shrinks further to 1.075× n entries. The query reads 4 values instead of 3, costing about 30% more time — but the space saving is substantial.
Building a binary fuse filter is like solving a clever puzzle. Here's the step-by-step process, simplified:
Hash every key to find which segment it starts in, then partially sort keys by segment. This is a single linear-time pass — cheap and fast.
For each array slot, count how many keys map to it. Because keys are sorted by segment, this scan moves forward through memory — great for the CPU cache.
Find slots with exactly one key mapped to them. "Peel" that key away — remove it from all three of its slots. This may reveal new singletons, creating a cascade (like toppling dominoes).
Walk through the peeled keys in reverse order. For each key, set one array slot's value so that XOR-ing all three (or four) slot values produces the key's fingerprint. This always works because of the peeling order.
If the peeling gets stuck (less than 1% of the time), just pick new hash functions and retry. Expected number of attempts: about 1.01.
Querying is beautifully simple. Given a candidate key:
If they match → "probably in the set." If not → "definitely not in the set." That's it. Three memory reads, one XOR, one comparison. Modern CPUs can issue all three memory reads simultaneously.
The array size and segment size are computed from the number of keys n:
3-wise: Array size ≈ \(\left\lfloor \left(0.875 + 0.25\,\max\!\left(1,\frac{\log 10^6}{\log n}\right)\right) n \right\rfloor\), converging to \(1.125n\) for large n.
4-wise: Array size ≈ \(\left\lfloor \left(0.77 + 0.305\,\max\!\left(1,\frac{\log(6 \times 10^5)}{\log n}\right)\right) n \right\rfloor\), converging to \(1.075n\) for large n.
Translation: For large sets, a 3-wise filter adds ~12.5% padding to the array (so an array of 1.125 million slots for 1 million keys). A 4-wise filter adds only ~7.5%. The xor filter needed 23% padding. The segment sizes grow sublinearly — roughly as \(n^{0.58}\) for 3-wise — so the ratio of array size to segment size increases to hundreds for large sets, giving many small neighborhoods.
Drag the slider to change the false-positive probability and see how much memory each filter type needs per key:
The researchers ran extensive benchmarks on an Intel Skylake i7-6700 (3.4 GHz, 8 MiB L3 cache) and an AMD EPYC 7262 (Rome Zen 2). Sets of 100 million random 64-bit keys, with queries containing 25% in-set keys. All single-threaded, no disk or network access. Let's dig in.
Lower is better. The theoretical minimum is 0%. Here's how much extra each filter uses:
Binary fuse filters dominate: 3-wise sits at ~13% overhead, 4-wise at ~8%. The xor filter wastes 23%, Bloom filters 44–50%, and blocked Bloom is the worst at 60–70%. Ribbon filters are competitive at ~10–15%, but trail in query speed.
Lower is faster. Tested with 100M keys, 25% in-set queries on Intel Skylake:
Blocked Bloom is the speed champion (it uses AVX2 SIMD instructions). But among space-efficient filters, 3-wise binary fuse matches xor filters at ~30 ns — roughly 2× faster than standard Bloom. The 4-wise variant is slightly slower (~40 ns) due to the extra memory read. Ribbon and vector quotient filters trail at 80–150 ns.
Lower is faster. Building the filter from 100M keys on Intel Skylake:
Binary fuse filters build 2× faster than xor filters — a dramatic improvement. This is thanks to the locality-friendly construction: pre-sorting by segment means memory access patterns are sequential rather than random. Cuckoo filters are the slowest to build.
| Filter | Bits / Key | Overhead | Query (ns) | Build (ns) | Verdict |
|---|---|---|---|---|---|
| 3-wise Binary Fuse | 9.0 | ~13% | ~30 | ~36 | ★ Best Overall |
| 4-wise Binary Fuse | 8.6 | ~8% | ~40 | ~30 | ★ Smallest |
| Xor | 9.84 | ~23% | ~30 | ~120 | Obsoleted |
| Bloom (12-bit) | ~12 | ~44% | ~60 | ~40 | Classic |
| Blocked Bloom | ~16+ | ~60%+ | ~15 | ~12 | Speed King |
| Cuckoo (12-bit) | ~13.5 | ~30% | ~40 | ~120 | Slow Build |
| Standard Ribbon (8-bit) | ~9.2 | ~15% | ~80 | ~65 | Flexible |
| Vector Quotient | ~11.7 | ~46% | ~55 | ~90 | Platform-dependent |
Note: Numbers are approximate from the paper's figures. The 16-bit versions double bits/key but halve the false-positive rate. Binary fuse filters for sets over ~10,000 keys; for tiny sets, xor may be marginally better.
The paper tests sets from 10,000 to 100 million keys. Key findings on scaling:
The authors explicitly state: "binary fuse filters render xor filters practically obsolete." Same query speed, much less memory, much faster to build.
The paper reveals a clear decision framework:
Use 4-wise binary fuse. Only 8% overhead. Query is ~30% slower than 3-wise, but space savings may matter more.
Use 3-wise binary fuse. 13% overhead with xor-matching query speed. This is the new default recommendation.
Use blocked Bloom with AVX2. Fastest queries, but 60%+ overhead.
Binary fuse filters require all keys upfront. Use cuckoo filters or Bloom filters for dynamic sets.
github.com/FastFilter/.This chart reproduces Figure 1 from the paper — the theoretical bits per key for each filter type across different false-positive probabilities:
At every false-positive level, binary fuse filters (especially 4-wise) are the most space-efficient option. The gap widens as you demand lower false-positive rates.
Here's what you can confidently tell someone about this paper: