Binary Fuse Filters

Why This Matters

The Expensive Question

Imagine you're a librarian at a library with billions of books. Every minute, someone walks up and asks: "Do you carry this book?" Checking the full catalog takes time. But what if you had a magic cheat sheet — a tiny note card that could almost always tell you "definitely no" or "probably yes" in a split second?

💡 Everyday analogy: Think of a nightclub bouncer with a guest list. Instead of reading through 10,000 names, the bouncer has a quick trick — they check if the name hashes to the right code. Occasionally they'll let someone in who's not on the list (a false positive), but they'll never turn away someone who is on the list. The trick saves enormous time when most people aren't on the list.

In computing, this "cheat sheet" is called a probabilistic filter — a data structure that gives instant answers about whether an item is in a set, using far less memory than storing the actual set. The tradeoff: it sometimes says "yes" when the answer is really "no" (a false positive), but it never says "no" when the answer is "yes".

Engineers use these filters everywhere: databases avoid expensive disk reads, networks skip unnecessary data transfers, web browsers check URLs against safe-browsing lists. The better the filter, the less memory wasted and the fewer false alarms.

The Theoretical Floor

There's a mathematical limit on how small any filter can be. To guarantee a false-positive rate of ε, you need at least \(\log_2(1/\epsilon)\) bits per stored key. That's the floor — you can't go lower. Everything else is overhead.

44%

Bloom filter overhead

23%

Xor filter overhead

13%

Binary fuse 3-wise overhead

Binary fuse 4-wise overhead

Binary fuse filters get closer to the theoretical minimum than anything before them — while building faster and querying just as quickly.

Building Blocks

The Filter Family Tree

Bloom Filters (1970)

The classic. Think of it like a big grid of light switches, all initially OFF. For each item you want to remember, you flip k specific switches to ON (chosen by hash functions). To check an item, you look at its k switches — if they're all ON, the item might be there. If any is OFF, it's definitely not.

Drawback: The more accurate you want it, the more switches you need to check per query. For a 1% false-positive rate, that's 7 switches. For 0.01%, it's 14. Each "switch check" is a random memory access — and those are slow. Plus, Bloom filters waste 44% more space than the theoretical minimum.

Variant — Blocked Bloom: Packs all switches for one item into one cache-friendly block. Super fast queries, but uses even more space (~30% more than standard Bloom).

Cuckoo Filters (2014)

Instead of flipping switches, cuckoo filters store fingerprints — short codes that represent each key. It's like giving each person a tiny ID badge number, and storing those badges in a clever two-slot hash table.

Only two memory lookups per query (vs. up to 14 for Bloom), and they're 30–40% above the theoretical minimum for 12-bit and 16-bit configurations. They also let you delete items — something Bloom filters can't do easily.

Xor Filters (2020)

The direct predecessor of binary fuse filters. Also fingerprint-based, but uses a clever XOR trick: each key maps to three array locations, and XOR-ing those three values gives the key's fingerprint. This means exactly 3 lookups per query, regardless of how low the false-positive rate is.

Xor filters are 23% above the minimum — better than Bloom or cuckoo. But their construction was slow (2× slower than alternatives) and they divided the array into only 3 big segments, limiting memory locality.

Ribbon Filters (2021)

Also fingerprint-based but use Gaussian elimination (a linear algebra technique) to solve for fingerprint values. Very configurable — you can trade construction time for space efficiency. Good construction speed, but query time is slower than xor or fuse filters.

"Could we get much closer to the theoretical lower bound in storage while keeping the high performance of xor filters?"

The Breakthrough

Binary Fuse Filters: The Key Insight

💡 The "neighborhood" analogy: Imagine you're sorting mail. The old xor filter approach divides the entire city into just 3 giant zip zones — every letter maps to 3 random addresses across the whole city. The binary fuse approach instead divides the city into hundreds of small neighborhoods, and requires that each letter's 3 addresses be in consecutive neighborhoods. Deliveries become far more efficient because the mail carrier doesn't zigzag across town.

The technical name for this change is spatial locality. Instead of dividing the fingerprint array into 3 large segments (as xor filters do), binary fuse filters divide it into hundreds of small, non-overlapping segments. Each key maps to three locations within three consecutive segments.

Why "Binary"?

The segment sizes are constrained to be powers of two (2, 4, 8, 16, ...). This isn't just a nice number — it means the computer can compute hash positions using ultra-fast bitwise operations instead of slow division. The name "binary" comes from this power-of-two requirement.

Why "Fuse"?

The term "fuse" comes from theoretical work by Dietzfelbinger and Walzer on fuse graphs. Think of a fuse as a chain where removing one link triggers a cascade — similar to how the filter's construction "peels" away keys one by one.

Two Flavors: 3-wise and 4-wise

In the 3-wise version, each key maps to 3 array locations (just like xor filters). For large sets, the array only needs 1.125× n entries instead of 1.23× n — saving ~10% immediately.

The 4-wise version maps each key to 4 locations. The array shrinks further to 1.075× n entries. The query reads 4 values instead of 3, costing about 30% more time — but the space saving is substantial.

Under the Hood

How Construction Works

Building a binary fuse filter is like solving a clever puzzle. Here's the step-by-step process, simplified:

Pre-Sort by Neighborhood

Hash every key to find which segment it starts in, then partially sort keys by segment. This is a single linear-time pass — cheap and fast.

Count Occupancy

For each array slot, count how many keys map to it. Because keys are sorted by segment, this scan moves forward through memory — great for the CPU cache.

Peel the Singletons

Find slots with exactly one key mapped to them. "Peel" that key away — remove it from all three of its slots. This may reveal new singletons, creating a cascade (like toppling dominoes).

Assign Fingerprints (Backwards)

Walk through the peeled keys in reverse order. For each key, set one array slot's value so that XOR-ing all three (or four) slot values produces the key's fingerprint. This always works because of the peeling order.

Handle Rare Failures

If the peeling gets stuck (less than 1% of the time), just pick new hash functions and retry. Expected number of attempts: about 1.01.

Querying is beautifully simple. Given a candidate key:

Compute three hash functions to get three array positions (in consecutive segments)
Read the three values from those positions
XOR them together
Compare the result with the key's fingerprint

If they match → "probably in the set." If not → "definitely not in the set." That's it. Three memory reads, one XOR, one comparison. Modern CPUs can issue all three memory reads simultaneously.

The array size and segment size are computed from the number of keys n:

3-wise: Array size ≈ \(\left\lfloor \left(0.875 + 0.25\,\max\!\left(1,\frac{\log 10^6}{\log n}\right)\right) n \right\rfloor\), converging to \(1.125n\) for large n.

4-wise: Array size ≈ \(\left\lfloor \left(0.77 + 0.305\,\max\!\left(1,\frac{\log(6 \times 10^5)}{\log n}\right)\right) n \right\rfloor\), converging to \(1.075n\) for large n.

Translation: For large sets, a 3-wise filter adds ~12.5% padding to the array (so an array of 1.125 million slots for 1 million keys). A 4-wise filter adds only ~7.5%. The xor filter needed 23% padding. The segment sizes grow sublinearly — roughly as \(n^{0.58}\) for 3-wise — so the ratio of array size to segment size increases to hundreds for large sets, giving many small neighborhoods.

The Evidence

Benchmark Results

The researchers ran extensive benchmarks on an Intel Skylake i7-6700 (3.4 GHz, 8 MiB L3 cache) and an AMD EPYC 7262 (Rome Zen 2). Sets of 100 million random 64-bit keys, with queries containing 25% in-set keys. All single-threaded, no disk or network access. Let's dig in.

Storage Overhead vs. Theoretical Minimum

Lower is better. The theoretical minimum is 0%. Here's how much extra each filter uses:

Binary fuse filters dominate: 3-wise sits at ~13% overhead, 4-wise at ~8%. The xor filter wastes 23%, Bloom filters 44–50%, and blocked Bloom is the worst at 60–70%. Ribbon filters are competitive at ~10–15%, but trail in query speed.

Query Time (nanoseconds per key)

Lower is faster. Tested with 100M keys, 25% in-set queries on Intel Skylake:

Blocked Bloom is the speed champion (it uses AVX2 SIMD instructions). But among space-efficient filters, 3-wise binary fuse matches xor filters at ~30 ns — roughly 2× faster than standard Bloom. The 4-wise variant is slightly slower (~40 ns) due to the extra memory read. Ribbon and vector quotient filters trail at 80–150 ns.

Construction Time (nanoseconds per key)

Lower is faster. Building the filter from 100M keys on Intel Skylake:

Binary fuse filters build 2× faster than xor filters — a dramatic improvement. This is thanks to the locality-friendly construction: pre-sorting by segment means memory access patterns are sequential rather than random. Cuckoo filters are the slowest to build.

Head-to-Head Comparison (8-bit / ~0.4% FPR, Intel Skylake)

Filter	Bits / Key	Overhead	Query (ns)	Build (ns)	Verdict
3-wise Binary Fuse	9.0	~13%	~30	~36	★ Best Overall
4-wise Binary Fuse	8.6	~8%	~40	~30	★ Smallest
Xor	9.84	~23%	~30	~120	Obsoleted
Bloom (12-bit)	~12	~44%	~60	~40	Classic
Blocked Bloom	~16+	~60%+	~15	~12	Speed King
Cuckoo (12-bit)	~13.5	~30%	~40	~120	Slow Build
Standard Ribbon (8-bit)	~9.2	~15%	~80	~65	Flexible
Vector Quotient	~11.7	~46%	~55	~90	Platform-dependent

Note: Numbers are approximate from the paper's figures. The 16-bit versions double bits/key but halve the false-positive rate. Binary fuse filters for sets over ~10,000 keys; for tiny sets, xor may be marginally better.

The paper tests sets from 10,000 to 100 million keys. Key findings on scaling:

Build time: Binary fuse stays at ~30–40 ns/key across all sizes. Xor filters degrade to ~120 ns/key for large sets due to cache misses.
Space: For sets under 20,000 keys, xor filters can be slightly smaller. Above that, binary fuse always wins. The advantage grows with set size.
Segment ratio: For 10M keys, the array has ~800 segments (3-wise). This means each segment is ~14,000 slots — small enough to stay in CPU cache during construction.
AMD platform: Results are consistent — binary fuse still dominates. Ribbon filters perform relatively worse on AMD due to microarchitecture differences. Vector quotient filters are non-competitive on AMD because the pdep instruction is slow.

So What?

Real-World Impact

🌍 Why you should care: Every time your browser checks a URL, every database query that avoids a disk read, every network packet that's filtered before processing — these filters are doing the heavy lifting. A 10% memory saving across billions of deployments means real energy and hardware savings.

Trade-off Guide

The paper reveals a clear decision framework:

Need smallest possible filter?

Use 4-wise binary fuse. Only 8% overhead. Query is ~30% slower than 3-wise, but space savings may matter more.

Need the best speed/size balance?

Use 3-wise binary fuse. 13% overhead with xor-matching query speed. This is the new default recommendation.

Need maximum query throughput, don't care about size?

Use blocked Bloom with AVX2. Fastest queries, but 60%+ overhead.

Need to add/remove keys dynamically?

Binary fuse filters require all keys upfront. Use cuckoo filters or Bloom filters for dynamic sets.

Limitations & Future Work

Immutable: You must have all keys at build time. No adding or removing one-by-one.
Minimum set size: Works best for sets ≥ 10,000 keys. For tiny sets, xor filters may be marginally better.
Single-threaded evaluation: The paper didn't test multi-core concurrency. Real-world database or network applications need this assessment.
Practical hash functions: The theory assumes fully random hashing; in practice, murmur hash is used. Bridging this gap remains open.
Available in multiple languages: C++, C, Java, and Go implementations are open-sourced at github.com/FastFilter/.

Summing Up

Key Takeaways

Here's what you can confidently tell someone about this paper:

Binary fuse filters are a new kind of probabilistic filter — a tiny data structure that tells you "definitely not in the set" or "probably yes" with tunable accuracy.
They use only 13% more memory than the absolute theoretical minimum (3-wise version) — half the overhead of Bloom filters and better than xor filters' 23%.
The 4-wise variant gets within 8% of the minimum, the closest any practical filter has come.
They query just as fast as xor filters (~30 ns per key for 3-wise) — about 2× faster than standard Bloom filters.
They build more than 2× faster than xor filters, thanks to a clever segment-based construction that's friendly to CPU caches.
They effectively make xor filters obsolete for sets of 10,000+ keys: smaller, faster to build, same query speed.
The main limitation: all keys must be known at build time (immutable). For dynamic sets, use Bloom or cuckoo filters instead.
Open source and multi-language: Available in C++, C, Java, and Go for immediate adoption.

The Fastest Way to Ask
"Is This in the Set?"

The Expensive Question

The Theoretical Floor

The Filter Family Tree

Bloom Filters (1970)

Cuckoo Filters (2014)

Xor Filters (2020)

Ribbon Filters (2021)

Binary Fuse Filters: The Key Insight

Why "Binary"?

Why "Fuse"?

Two Flavors: 3-wise and 4-wise

How Construction Works

Pre-Sort by Neighborhood

Count Occupancy

Peel the Singletons

Assign Fingerprints (Backwards)

Handle Rare Failures

Space Usage Explorer

Benchmark Results

Storage Overhead vs. Theoretical Minimum

Query Time (nanoseconds per key)

Construction Time (nanoseconds per key)

Head-to-Head Comparison (8-bit / ~0.4% FPR, Intel Skylake)

What Changed

Before (Xor Filters)

After (Binary Fuse)

Real-World Impact

Trade-off Guide

Need smallest possible filter?

Need the best speed/size balance?

Need maximum query throughput, don't care about size?

Need to add/remove keys dynamically?

Limitations & Future Work

Theoretical Memory Usage

Key Takeaways

The Fastest Way to Ask"Is This in the Set?"

The Expensive Question

The Theoretical Floor

The Filter Family Tree

Bloom Filters (1970)

Cuckoo Filters (2014)

Xor Filters (2020)

Ribbon Filters (2021)

Binary Fuse Filters: The Key Insight

Why "Binary"?

Why "Fuse"?

Two Flavors: 3-wise and 4-wise

How Construction Works

Pre-Sort by Neighborhood

Count Occupancy

Peel the Singletons

Assign Fingerprints (Backwards)

Handle Rare Failures

Space Usage Explorer

Benchmark Results

Storage Overhead vs. Theoretical Minimum

Query Time (nanoseconds per key)

Construction Time (nanoseconds per key)

Head-to-Head Comparison (8-bit / ~0.4% FPR, Intel Skylake)

What Changed

Before (Xor Filters)

After (Binary Fuse)

Real-World Impact

Trade-off Guide

Need smallest possible filter?

Need the best speed/size balance?

Need maximum query throughput, don't care about size?

Need to add/remove keys dynamically?

Limitations & Future Work

Theoretical Memory Usage

Key Takeaways

The Fastest Way to Ask
"Is This in the Set?"