Ottawa Linux Symposium 2002

Futexes: The Secret to Fast Locking in Linux

How three engineers made Linux's process synchronization dramatically faster โ€” by keeping the kernel out of the fast path.

Scroll to explore โ†“
The Problem

Why Do Programs Need Locks?

When multiple processes or threads access the same data simultaneously, chaos ensues. Locks are the traffic lights of the computing world.

๐Ÿ’ฅ

The Race Condition

Imagine two bank tellers reading the same account balance of $100, each processing a $60 withdrawal. Without a lock, both approve โ€” and the bank loses $20.

๐Ÿ”’

Locks to the Rescue

A lock ensures only one process can access shared data at a time. The first teller locks the account, processes the withdrawal, then unlocks โ€” the second teller sees the updated balance.

๐Ÿข

But Locks Were Slow

Before futexes, every lock/unlock required a system call โ€” a costly trip into the kernel. Even when nobody else was competing for the lock!

The Insight

Skip the Kernel When You Can

The key insight: most of the time, there's no contention. So why bother the kernel at all?

โŒ Old Way: System V Semaphores

Every single lock operation requires a system call โ€” even when no one else wants the lock.

  1. Process wants the lock
  2. Makes an expensive system call to the kernel
  3. Kernel checks lock state
  4. Kernel grants the lock
  5. Returns to user space (hundreds of CPU cycles wasted!)

โœ… New Way: Futexes

Use atomic CPU instructions in user space. Only call the kernel when there's actual contention.

  1. Process wants the lock
  2. Atomically flips a bit in shared memory
  3. Got it? Done! No kernel call needed.
  4. Contended? Only then call the kernel to sleep.
  5. Kernel wakes you when the lock is free
~85%
Efficiency without kernel calls (futex, uncontended)
~25%
Efficiency with System V semaphores
233
Lines of kernel code for the futex implementation
Interactive Demo

See a Futex in Action

Click the buttons to simulate processes trying to acquire a shared lock. Watch how futexes handle contention.

๐Ÿ”“
Lock is FREE
Processes:

Controls

โ†’ Ready. Click a button to start.
History

The Evolution of Futexes

From early prototypes to kernel inclusion โ€” the journey of fast userlevel locking in Linux.

Early Designs
Explicit Kernel Objects
The first approach allocated a kernel object and exported its address as a handle. Required explicit allocation/deallocation and security keys. Too heavyweight.
Prototype: ulocks
User Semaphores with Fair & CA Wakeup
Implemented general user semaphores and read/write locks. Used hash tables to map virtual addresses to kernel objects. Worked well, but the infrastructure was deemed too complex.
Linux 2.5.7 (Early 2002)
Futexes Enter the Kernel ๐ŸŽ‰
Combined the best elements of multiple independent implementations. Only 233 lines of kernel code! Used page pointer + offset as unique identifier. Hash table on kernel stack.
Post-2.5.7 Refinement
FUTEX_WAIT / FUTEX_WAKE API
Replaced FUTEX_UP/DOWN with a more flexible compare-and-swap style interface. Added timeout support, broadcast wakeups, and POSIX threads compatibility.
Also Introduced
Furwocks (Fast Userspace R/W Locks)
Paul Mackerras designed read/write locks using two futexes and a counter โ€” no kernel changes needed! Outperformed the integrated ulock approach in benchmarks.
Fairness

Who Gets the Lock Next?

Different wakeup policies trade off fairness for performance. The paper explores three strategies.

FIFO queueing: The lock is granted in the exact order processes requested it. Prevents starvation, but can cause the "convoy problem" โ€” everyone moves at the speed of the slowest process.
When lock is released, who gets it next?
Performance

The Numbers Don't Lie

Benchmarks on a real database (TDB) show futexes dramatically outperforming traditional locks.

TDB Database Benchmark โ€” Completion Time (seconds, lower is better)

6 processes doing 200,000 random database operations each

Synthetic Benchmarks

Up to 11,000% Faster

With 1000 tasks competing for a single lock, convoy-avoidance futexes showed over 100ร— the throughput improvement over System V semaphores.

% Improvement Over System V Semaphores

Regular (convoy-avoidance) futexes, (9,1) configuration โ€” lower contention

Under the Hood

How Futexes Actually Work

The elegance is in the simplicity. A futex is just an integer in shared memory.

// The futex "lock" is just an integer in shared memory int futex_word = 1; // 1 = available, 0 = locked, negative = waiters // LOCK (user space - fast path) if (atomic_dec(&futex_word) != 0) { // Someone else has it โ€” enter kernel sys_futex(&futex_word, FUTEX_WAIT, expected_val); } // We have the lock! // UNLOCK (user space - fast path) if (atomic_inc(&futex_word) != 1) { // There were waiters โ€” wake one up sys_futex(&futex_word, FUTEX_WAKE, 1); } // Done!

๐Ÿ”‘ The Key Innovation

Futexes identify locks by their physical page + offset in memory, not by a kernel handle. This means different processes can map the same lock at different virtual addresses, and the kernel still knows it's the same lock. Kernel objects are created on-demand on the stack โ€” no pre-allocation needed!

โšก

FUTEX_WAIT

"I expect this value at the futex address. If it's still that value, put me to sleep. If it changed, return immediately." This prevents race conditions.

๐Ÿ“ฃ

FUTEX_WAKE

"Wake up N processes waiting on this futex." Doesn't change the futex value โ€” that's left to user space. Returns how many were actually woken.

Key Concepts

The Three Big Challenges

๐ŸŽ๏ธ

The Convoy Problem

With FIFO locking, all processes move at the speed of the slowest one โ€” like a highway convoy stuck behind a truck. Convoy-avoidance locking lets the fastest process grab the lock immediately.

๐Ÿ˜

Thundering Herd

Waking ALL waiters when a lock is released wastes CPU โ€” only one can succeed. The solution: wake only one waiter (FUTEX_WAKE with val=1).

โ˜ ๏ธ

Dead Locks

If a process dies while holding a lock, the lock is "dead" โ€” no one can acquire it. This remains an unsolved challenge for futexes since lock acquisition is in user space.

Impact

Why This Still Matters

Futexes became the foundation of virtually all synchronization in modern Linux.

๐Ÿงต

POSIX Threads (pthreads)

Every pthread_mutex_lock(), pthread_cond_wait(), and semaphore in modern Linux is built on top of futexes.

๐Ÿ—„๏ธ

Databases

Systems like PostgreSQL, MySQL, and SQLite benefit from futex-based locking for internal buffer management and coordination.

๐ŸŒ

Every Linux Application

Any multi-threaded program on Linux โ€” from web servers to desktop apps โ€” uses futexes under the hood whenever threads synchronize.