Code-Generation On-the-Fly – Interactive Explainer

The Problem

Why Can't Software Just Work Everywhere?

In the early 1990s — just as today — different computers spoke different "machine languages." A program compiled for an Intel chip couldn't run on a Motorola chip. This meant software had to be recompiled for every platform, costing time and money. Michael Franz asked: What if we could delay the final step of compilation until the exact moment a program is loaded?

🔒

Platform Lock-In

Compiled programs are tied to one processor architecture. Sharing software across platforms requires recompilation from source — if you even have the source.

💸

High Distribution Cost

Software vendors must maintain separate binaries for every target machine, multiplying development and testing effort.

🐌

Interpreters Are Slow

Previous portable solutions used interpreters (like Java's JVM ancestor, Pascal P-code), but these ran 5–20× slower than native code.

The Solution

Semantic Dictionary Encoding (SDE)

Franz invented a new way to store programs called Semantic Dictionary Encoding. Instead of machine instructions or abstract-machine bytecodes, SDE stores a compressed representation of the program's meaning — its syntax tree — as a series of indices into a growing dictionary. Think of it like a ZIP file, but specifically designed for code, that can be turned into native machine instructions at blazing speed.

2.5×

Smaller than native object files

~1.5×

Load time vs native loading

100%

Source-level info preserved

≈

Code quality vs. optimizing C compiler

How It Works

The Pipeline: From Source to Execution

Click each stage to learn what happens at every step of the process.

📝

Source Code
(Oberon)

→

🗜️

SDE Encoder
(Compiler Front-End)

→

📦

SDE File
(Portable)

→

⚡

Code-Gen Loader
(On-the-Fly)

📝 Source Code (Oberon)

The programmer writes code in the Oberon programming language — a clean, modular language designed by Niklaus Wirth. The source text is a human-readable description of what the program does.

Deep Dive

How the Semantic Dictionary Works

The dictionary starts with templates for every language operation (addition, assignment, etc.) and an entry for every variable. As the program is encoded, new entries are added for combinations seen before — like autocomplete for code patterns.

1. Source Code

2. Initial Dictionary

3. First Encoding

4. Dictionary Grows

5. Reuse!

MODULE M;
  VAR i, j, k: INTEGER;
BEGIN
  i := i + j;   (* statement 1 *)
  j := i + k;   (* statement 2 *)
  k := i + j;   (* statement 3 *)
  i := i + j;   (* statement 4 — same as 1! *)
END M.
      

Notice statement 4 is identical to statement 1. SDE will exploit this repetition.

── Initial Semantic Dictionary ──
asgn  │ assignment    │ . := .    │ template (left & right missing)
plus  │ addition      │ . + .    │ template (left & right missing)
vi    │ variable      │ i        │ complete entry
vj    │ variable      │ j        │ complete entry
vk    │ variable      │ k        │ complete entry
      

Templates have "holes" (shown as dots). Complete entries represent concrete variables. The dictionary knows every possible operation and every declared variable.

── Encoding: i := i + j ──

File output: asgn vi plus vi vj
             ↑     ↑    ↑     ↑    ↑
        ". := ."  "i" ". + ."  "i"  "j"

Five dictionary indices encode the entire statement!
      

Each index is a tiny number pointing into the dictionary. The decoder reads them and knows exactly what code to generate.

── New entries added after encoding "i := i + j" ──

n+0  │ addition      │ i + j    │ complete — seen this expression!
n+1  │ addition      │ i + .    │ template — "i plus something"
n+2  │ assignment    │ i := .   │ template — "assign to i"
n+3  │ assignment    │ i := i+j │ complete — the whole statement!
      

The dictionary speculatively grows with new entries based on what was just seen. If similar patterns appear later, they can be encoded much more compactly.

── Encoding statement 4: "i := i + j" (again!) ──

File output: n+3

Just ONE dictionary index! The entire statement was already in the dictionary.
The decoder can even copy the machine code it generated the first time.

This is the magic. Repeated patterns collapse to single indices. The decoder can also use "code-copying" — just duplicating the machine instructions it already generated for the same expression.

Performance

Loading Speed Across Processors

Franz tested on four generations of Motorola 680x0 processors. As processors got faster, the gap between native loading and SDE code-generation shrank rapidly — from 40% overhead down to just 7%.

Draw.Open Counters.Graph — Time (ms) on Different Machines

📈

The Trend Favors SDE

CPU speed grows faster than disk speed improves. Since SDE files are smaller (less disk I/O) and code generation is CPU-bound, the technique gets more competitive over time.

⚙️

Code Quality

The dynamically generated code matched or beat Apple's optimizing MPW C compiler on 7 out of 10 benchmarks — despite being generated in milliseconds.

Code Quality

SDE vs. Optimizing C Compiler

Execution times (ms) on a Macintosh Quadra 840AV. Lower is better. Bold values indicate the winner.

Benchmark	SDE (on-the-fly)	Apple MPW C (optimized)	Winner
Permutation	83	113	SDE
Towers of Hanoi	83	121	SDE
Eight Queens	50	43	MPW C
Integer Matrix Multiply	150	173	SDE
Real Matrix Multiply	133	171	SDE
Puzzle	800	800	Tie
Quicksort	66	61	MPW C
Bubblesort	117	88	MPW C
Treesort	83	>1000	SDE
FFT	133	123	MPW C

Why It Matters

Benefits Beyond Portability

Deferring code generation to load time opens up several surprising capabilities that go well beyond just running on different machines.

🛡️ Load-Time Safety Decisions

Decide when loading whether to include runtime checks (array bounds, nil pointers, type tests). No need to maintain separate "debug" and "release" builds — one SDE file serves both purposes.

🔧 Fewer Recompilations

When a library module's internals change (but not its interface), SDE clients don't need source-level recompilation. The code-generating loader silently adjusts field offsets and sizes at load time.

🎯 Processor-Specific Optimization

Even within the same CPU family (e.g., 68020 vs 68040), the code generator can tailor instruction scheduling to the exact processor model running the code. Existing software benefits from new CPUs automatically.

🏗️ Software Component Industry

Franz argued this enables "plug-in" software components — libraries sold in SDE form that work on any compatible machine without recompilation, just like hardware components. (This vision foreshadowed Java bytecodes, released the following year.)

Historical Context

A Visionary Idea, One Year Before Java

This dissertation was accepted in 1994 — one year before Sun Microsystems released Java with its "write once, run anywhere" promise using JVM bytecodes. Franz's approach was fundamentally different and in some ways more ambitious: instead of interpreting an abstract machine, SDE generates native code at full speed. The core insight — that compact intermediate representations plus fast code generation can replace binary compatibility — remains influential in modern JIT compilation, .NET, WebAssembly, and beyond.

🔮

Predicted JIT Compilation

The idea that processors get faster while disks stay slow — so on-the-fly compilation from compact formats will win — is now mainstream (JIT in V8, HotSpot, etc.)

🌐

Foreshadowed WebAssembly

WASM's compact binary format that browsers compile to native code at load time echoes Franz's SDE approach three decades later.

📐

Influenced ANDF/TDF

The OSF's Architecture Neutral Distribution Format (ANDF) pursued similar goals, though SDE was 4× more compact and targeted dynamic loading rather than static linking.

Code-Generation On‑the‑Fly

Why Can't Software Just Work Everywhere?

Platform Lock-In

High Distribution Cost

Interpreters Are Slow

Semantic Dictionary Encoding (SDE)

The Pipeline: From Source to Execution

📝 Source Code (Oberon)

How the Semantic Dictionary Works

How Compact Is SDE?

Loading Speed Across Processors

The Trend Favors SDE

Code Quality

SDE vs. Optimizing C Compiler

Benefits Beyond Portability

🛡️ Load-Time Safety Decisions

🔧 Fewer Recompilations

🎯 Processor-Specific Optimization

🏗️ Software Component Industry

A Visionary Idea, One Year Before Java

Predicted JIT Compilation

Foreshadowed WebAssembly

Influenced ANDF/TDF