How JIT Compilers Make Dynamic Languages Fast

Ruby has a reputation for being slow. So does Python. So does JavaScript — or at least, it did, until V8 made it fast enough to run server-side workloads. The story of how dynamic languages get fast is the story of JIT (Just-In-Time) compilation, and it's one of the most fascinating areas of practical computer science. The latest chapter: Ruby's ZJIT is removing redundant object loads and stores at the intermediate representation level, the same class of optimization that made V8's TurboFan so effective.

If you've ever wondered why your Ruby code runs at a fraction of C's speed, or how JavaScript became fast enough to power VS Code, the answer lies in understanding what JIT compilers actually do — and what makes optimizing dynamic languages fundamentally harder than optimizing static ones.

The Fundamental Problem With Dynamic Languages

When a C compiler sees a + b, it knows the types of a and b at compile time. If they're both integers, it emits a single ADD instruction. If they're floats, it emits a floating-point add. The CPU executes that instruction in one cycle. There's no ambiguity, no decision-making at runtime.

When a Ruby interpreter sees a + b, it knows almost nothing. a could be an integer, a float, a string, an array, or any object that defines a + method. The interpreter has to: check the type of a, find the + method for that type, check the type of b, potentially coerce types, handle edge cases (overflow, frozen objects), and finally perform the operation. That single + might involve dozens of instructions, memory lookups, and branch decisions.

# What the programmer writes:
def sum(a, b)
a + b
end
# What the interpreter actually has to do (pseudocode):
def sum(a, b)
method = lookup_method(a.class, :+)      # Hash lookup
raise NoMethodError unless method         # Branch
if a.is_a?(Integer) && b.is_a?(Integer)   # Type checks
result = integer_add(a.value, b.value)  # Actual math
if overflow?(result)                     # Overflow check
result = promote_to_bignum(result)    # Bignum promotion
end
elsif a.is_a?(Float) || b.is_a?(Float)
result = float_add(to_float(a), to_float(b))
elsif a.is_a?(String)
result = string_concat(a, b.to_s)
else
result = call_method(method, a, [b])    # Generic dispatch
end
result
end

This overhead — type checking, method lookup, dispatch — is the tax you pay for dynamism. The flexibility to write a + b and have it work for integers, floats, strings, and custom objects is expensive at runtime.

How JIT Compilation Fights Back

A JIT compiler's job is to observe what the program actually does at runtime and generate optimized machine code based on those observations. The key insight: while Ruby code could operate on any type, in practice, any given call site almost always sees the same types.

If sum(a, b) has been called 10,000 times and a and b have always been integers, the JIT can generate specialized machine code that assumes they'll continue to be integers. It emits a single integer add instruction with a 'guard' — a quick type check that bails out to the slow path if the assumption is ever violated.

Interpreter execution of sum(a, b) where a and b are Integers:
1. Load a from stack
2. Check if a is an object reference
3. Load a's class pointer
4. Look up :+ in method table (hash lookup)
5. Check method visibility
6. Load b from stack
7. Check if b is an object reference
8. Check if b is compatible type
9. Unbox a's integer value
10. Unbox b's integer value
11. Perform addition
12. Check for overflow
13. Box result as new Integer object
14. Return result
JIT-compiled version (after profiling shows a,b are always Integers):
1. Guard: check a is Integer (bail if not)
2. Guard: check b is Integer (bail if not)
3. ADD instruction
4. Guard: check no overflow (bail if overflow)
5. Return result

That's a 14-step process reduced to 5 steps, where steps 1-2 and 4 are single comparison instructions. The actual computation — the ADD — is one CPU cycle. This is how JavaScript went from 'too slow for anything serious' to 'fast enough to run a full IDE.'

Ruby's JIT Journey: From MJIT to YJIT to ZJIT

Ruby's history with JIT compilation is a case study in how hard this problem is. Ruby has gone through several JIT implementations, each taking a different approach.

MJIT (Ruby 2.6, 2018) took the approach of translating Ruby bytecode to C code, then calling GCC or Clang to compile it. This produced well-optimized code but had terrible warm-up time — compiling C takes seconds, not milliseconds. By the time the JIT-compiled code was ready, the program might have already finished executing.

YJIT (Ruby 3.1, 2022) was Shopify's contribution, written first in C and later rewritten in Rust. YJIT uses a technique called 'lazy basic block versioning' — it compiles code one basic block at a time, only when that code is actually executed, and creates specialized versions based on the types it observes. This gives fast warm-up (milliseconds, not seconds) with good peak performance. YJIT typically improves Ruby performance by 15-30% on real-world workloads like Rails applications.

ZJIT is the next evolution, and it's where things get really interesting. ZJIT introduces an intermediate representation (IR) — a structured representation of the program between bytecode and machine code. This IR enables classical compiler optimizations that YJIT's direct bytecode-to-machine-code approach couldn't easily do.

Eliminating Redundant Loads and Stores

The specific optimization that ZJIT recently landed — removing redundant object loads and stores — sounds arcane but has a huge impact. Here's why.

Ruby objects store their instance variables in a property table. Every time you read @name, the interpreter loads the value from the object's property table in memory. Every time you write @name = value, it stores to that table. In a method that accesses the same instance variable multiple times, the interpreter loads it from memory each time — because in the general case, something might have changed the value between reads.

class Rectangle
def area
@width * @height
end
def perimeter
# @width and @height are each loaded TWICE from memory
2 * (@width + @height)
end
def scale(factor)
@width = @width * factor    # Load @width, multiply, store @width
@height = @height * factor  # Load @height, multiply, store @height
self
end
end
# In a tight loop, these redundant loads add up:
rects.each { |r| r.scale(2).perimeter }

With an IR, ZJIT can perform load elimination: if @width was already loaded and nothing has modified it since, reuse the value from a register instead of loading it from memory again. It can also perform store elimination: if you write to @width twice in sequence without anyone reading it between writes, the first store can be eliminated.

These optimizations are table stakes in static language compilers — GCC and LLVM have done this for decades. But doing it for a dynamic language is much harder because of aliasing. In Ruby, calling any method could potentially modify any object's instance variables (via instance_variable_set, method_missing, or trace hooks). The JIT has to prove that between two reads of @width, nothing could have changed it — which requires analyzing what every intervening operation might do.

The IR Advantage

The reason ZJIT introduced an IR (and why V8's TurboFan, JavaScriptCore's DFG/FTL, and HotSpot's C2 all use IRs) is that it makes these optimizations composable. An IR is essentially a graph of operations where optimizations can be applied as graph transformations.

Constant folding: If both operands of an addition are known constants, replace the operation with its result. 2 + 3 becomes 5 at compile time.
Dead code elimination: If an operation's result is never used, remove it entirely.
Common subexpression elimination: If the same computation appears twice, compute it once and reuse the result.
Load/store elimination: Remove redundant memory operations as described above.
Escape analysis: If an object is created and never leaves the current method, allocate it on the stack instead of the heap (or eliminate the allocation entirely).
Inlining: Replace a method call with the method body, exposing more opportunities for the other optimizations.

These optimizations compound. Inlining a method exposes its operations to the caller's context, which may reveal constant values, which enable constant folding, which makes code dead, which gets eliminated. A single inlining decision can cascade into removing dozens of operations.

The Deoptimization Safety Net

Everything a JIT compiler does is speculative. It assumes types won't change, methods won't be redefined, and monkey-patching won't invalidate its optimized code. When those assumptions break, the JIT needs to 'deoptimize' — throw away the optimized code and fall back to the interpreter.

Deoptimization is one of the hardest parts of JIT design. The optimized code may have eliminated local variables, reordered operations, or inlined deeply nested calls. To fall back to the interpreter, the JIT must reconstruct the interpreter state — all local variables, the call stack, program counter — from whatever the optimized code has available. This requires maintaining metadata (called 'on-stack replacement' or OSR maps) that maps optimized code states back to interpreter states.

When deoptimization happens frequently — a condition called 'deopt thrashing' — performance can be worse than pure interpretation. The JIT spends time compiling optimized code, running it briefly, deoptimizing, and repeating. V8 handles this by tracking deoptimization counts and eventually giving up on optimizing a particular function. YJIT takes a simpler approach: it generates multiple versions of each code path for different type combinations, which reduces the need for deoptimization at the cost of more generated code.

Why This Matters Beyond Ruby

Ruby's JIT journey mirrors what's happening across dynamic languages. Python's copy-and-patch JIT (landed in CPython 3.13) is the first step toward proper JIT compilation for Python. LuaJIT has been remarkably fast for years by using an aggressive tracing JIT. PHP's JIT in PHP 8.0+ uses LLVM for its IR-based optimizations.

The pattern is consistent: start with an interpreter, add profiling to understand runtime behavior, compile hot paths with type specialization, introduce an IR for classical optimizations, and refine. Each language faces the same challenges — dynamic dispatch, mutable objects, eval, monkey-patching — and arrives at similar solutions.

For developers using these languages, the practical takeaway is that the performance gap between dynamic and static languages is narrowing. It will never close entirely — the type checks and guards still cost something, and deoptimization is an inherent overhead. But a well-optimized JIT can get within 2-5x of equivalent C code for computational workloads, which is 'fast enough' for the vast majority of applications.

The more subtle takeaway: write straightforward code. JIT compilers optimize predictable patterns. Monomorphic call sites (where a method always receives the same types) optimize well. Polymorphic call sites (where types vary) are harder. Megamorphic call sites (dozens of types) may never get optimized. Code that's simple for a human to understand is usually simple for a JIT to optimize — which is a nice alignment of incentives.