The Art of Dithering: Beautiful Images With One Bit

Lucas Pope's Return of the Obra Dinn is one of the most visually striking games ever made, and it uses exactly two colors: black and white. One bit per pixel. No grays, no gradients, no anti-aliasing. Every pixel is either on or off. Yet somehow, the game renders a fully 3D world with recognizable characters, depth perception, and atmospheric lighting — all through clever dithering.

Dithering is the technique of arranging black and white pixels in patterns that trick your eye into seeing intermediate shades. It's been around since the earliest days of computer graphics, and it's having a revival — in game development, web design, and the increasingly relevant problem of displaying images on e-ink screens. The algorithms behind it are elegant, surprisingly varied, and more applicable than you might think.

Why Dithering Works

Your visual system doesn't process individual pixels — it integrates regions. When you look at a pattern of alternating black and white pixels, you don't see individual dots. You see gray. A region where 25% of pixels are white looks like 25% gray. A region where 75% are white looks like 75% gray. The density of white pixels in an area controls the perceived brightness.

This is the same principle behind newspaper halftone printing. Those dots of varying size and spacing create the illusion of continuous tone from pure black ink on white paper. Dithering is the digital equivalent — using pixel patterns instead of dot sizes.

The challenge isn't creating grayscale — that's straightforward. The challenge is doing it without visible artifacts: no banding (abrupt transitions between patterns), no directional biases (patterns that create visible lines), and no moiré effects (interference patterns between the dithering and the display grid).

Threshold Dithering: The Naive Approach

The simplest dithering: compare each pixel's brightness to a threshold (usually 50%). Brighter than the threshold → white. Darker → black. This produces a high-contrast silhouette with no gradation at all. Useful for text and line art, terrible for photographs or 3D rendering.

import numpy as np
from PIL import Image
def threshold_dither(image, threshold=128):
"""Binary threshold — no dithering at all, really."""
gray = np.array(image.convert('L'), dtype=float)
return Image.fromarray((gray > threshold).astype(np.uint8) * 255)

Ordered Dithering: Patterns on a Grid

Ordered dithering uses a threshold matrix (also called a Bayer matrix) that varies the threshold across a repeating tile pattern. Instead of comparing every pixel against the same threshold, each pixel is compared against a different value from the matrix based on its position.

def ordered_dither(image, matrix_size=4):
"""Ordered dithering with Bayer matrix."""
gray = np.array(image.convert('L'), dtype=float)
h, w = gray.shape
# 4×4 Bayer matrix (values 0-15, normalized to 0-255)
bayer_4x4 = np.array([
[ 0,  8,  2, 10],
[12,  4, 14,  6],
[ 3, 11,  1,  9],
[15,  7, 13,  5]
]) * (255 / 16)
# Tile the matrix across the image
threshold = np.tile(bayer_4x4, (h // 4 + 1, w // 4 + 1))[:h, :w]
return Image.fromarray((gray > threshold).astype(np.uint8) * 255)

Ordered dithering produces a distinctive crosshatch pattern that's immediately recognizable. At low resolutions, the regular grid structure is visible, giving the image a mechanical, retro quality. This is actually an aesthetic choice in many contexts — the ordered pattern of an 8×8 Bayer matrix has a distinctive look that many games and artists intentionally use.

The Bayer matrix is carefully constructed so that each threshold level adds new pixels in positions maximally spaced from existing ones. This minimizes clumping and gives the most even distribution of dots at every brightness level. It's also extremely fast — one comparison per pixel, no neighbor dependencies — which makes it suitable for real-time rendering.

Error Diffusion: Floyd-Steinberg and Friends

Error diffusion dithering takes a completely different approach. Instead of using a fixed threshold matrix, it processes pixels sequentially and distributes the 'error' — the difference between the original pixel value and the quantized (black or white) result — to neighboring pixels that haven't been processed yet.

def floyd_steinberg_dither(image):
"""Floyd-Steinberg error diffusion dithering."""
gray = np.array(image.convert('L'), dtype=float)
h, w = gray.shape
for y in range(h):
for x in range(w):
old_pixel = gray[y, x]
new_pixel = 255.0 if old_pixel > 128 else 0.0
gray[y, x] = new_pixel
error = old_pixel - new_pixel
# Distribute error to neighbors
# Floyd-Steinberg diffusion kernel:
#        * 7/16
#  3/16 5/16 1/16
if x + 1 < w:
gray[y, x + 1] += error * 7 / 16
if y + 1 < h:
if x - 1 >= 0:
gray[y + 1, x - 1] += error * 3 / 16
gray[y + 1, x] += error * 5 / 16
if x + 1 < w:
gray[y + 1, x + 1] += error * 1 / 16
return Image.fromarray(gray.astype(np.uint8))

Floyd-Steinberg dithering produces much more natural-looking results than ordered dithering. The pixel distribution has no visible grid pattern, and it handles smooth gradients beautifully. The dots cluster organically, creating a look that's closer to photographic halftone printing.

The downside: it's sequential. Each pixel depends on the error from previously processed pixels, so you can't easily parallelize it. This makes it slower than ordered dithering and harder to implement in shader-based rendering pipelines. For pre-processed images, this doesn't matter. For real-time 3D rendering at 60fps, it's a problem.

Blue Noise Dithering: The Best of Both Worlds

Blue noise dithering is the current state of the art for many applications. It uses a threshold matrix like ordered dithering (so it's fast and parallelizable) but the matrix values are arranged to produce a blue noise distribution — meaning the dots are randomly placed but with a minimum spacing constraint, avoiding both the clumping of white noise and the grid patterns of ordered dithering.

The term 'blue noise' comes from signal processing. Blue noise has most of its energy at high frequencies — like blue light. In spatial terms, this means the pattern has no large-scale structure (no visible grid, no directional bias) but the dots are evenly spaced at the local level (no clumps, no gaps). It's the pattern you get if you throw darts at a board with the constraint that each dart must be at least some minimum distance from every other dart.

Generating a good blue noise threshold matrix is surprisingly hard — it's an optimization problem that requires iterating until the frequency spectrum matches the desired profile. But once you have the matrix, using it is as fast as Bayer dithering: one comparison per pixel.

Obra Dinn's Approach: Spherical Mapped Dithering

Return of the Obra Dinn's dithering is particularly clever because it solves a problem unique to 3D rendering: temporal stability. When a standard dithering pattern is applied in screen space, moving the camera causes the dithering pattern to 'swim' — the pattern stays fixed to the screen while the 3D geometry moves underneath it. This creates a shimmering effect that's extremely distracting.

Pope's solution: map the dithering pattern in 3D space rather than screen space. The dithering pattern is projected onto a sphere surrounding the camera, then sampled based on the view direction for each pixel. When the camera rotates, the dithering pattern rotates with it. When objects move in world space, their dithering is stable because it's derived from their world-space position, not their screen position.

This is a technique you can adapt for any 3D application that uses dithering or halftone effects. The key insight: if you want temporal stability, your threshold values should be a function of something stable (world position, view direction) rather than something that changes every frame (screen coordinates).

Practical Applications Beyond Retro Aesthetics

Dithering isn't just for retro game aesthetics. It has real practical applications in modern development.

E-ink displays. E-readers and e-ink signage have limited grayscale capability (typically 16 levels). Dithering is essential for displaying photographs and gradients on these devices. The choice of dithering algorithm directly affects readability and image quality.
Reducing file size. A 1-bit dithered image is 1/8 the raw size of an 8-bit grayscale image. For web delivery of decorative images, dithered PNGs can be dramatically smaller than grayscale alternatives. Some web designers use dithered images deliberately for both aesthetic and performance reasons.
Color quantization. When you reduce a full-color image to a limited palette (GIF, indexed PNG, or palette-constrained displays), dithering prevents the banding that occurs when smooth gradients are mapped to discrete color values. Floyd-Steinberg dithering in color space is how GIF encoders handle this.
Gradient banding in rendering. Low-precision render targets (8-bit per channel) can produce visible banding in smooth gradients. Adding screen-space dithering (even just ±0.5 of a value) breaks up the banding without visible noise. Most modern game engines do this automatically.
Thermal and receipt printing. Thermal printers are inherently 1-bit — each dot is either heated (black) or not. Dithering converts photographs and graphics to printable form. The choice of algorithm determines whether your receipt image is recognizable or a blob.

Implementing Dithering in a Shader

For real-time applications, ordered dithering and blue noise dithering can be implemented efficiently as fragment shaders.

// GLSL fragment shader for Bayer 8x8 ordered dithering
precision mediump float;
uniform sampler2D uTexture;
varying vec2 vTexCoord;
// Bayer 8x8 matrix lookup via bit manipulation (no texture needed)
float bayer8x8(vec2 pos) {
ivec2 p = ivec2(mod(pos, 8.0));
int index = p.x + p.y * 8;
// Bit-reversal trick for Bayer matrix values
int value = 0;
int x = p.x ^ p.y;
int y = p.y;
for (int i = 0; i < 3; i++) {
value = value * 4 + (x & 1) * 2 + (y & 1);
x >>= 1;
y >>= 1;
}
return float(value) / 64.0;
}
void main() {
vec4 color = texture2D(uTexture, vTexCoord);
float luminance = dot(color.rgb, vec3(0.299, 0.587, 0.114));
float threshold = bayer8x8(gl_FragCoord.xy);
float dithered = step(threshold, luminance);
gl_FragColor = vec4(vec3(dithered), 1.0);
}

For blue noise, you typically precompute the noise texture and sample it in the shader. The texture can be tiled, so a 64×64 or 128×128 blue noise texture covers the entire screen when repeated. The per-pixel cost is one texture sample and one comparison — essentially free on modern GPUs.

Choosing the Right Algorithm

Each dithering approach has a use case where it's the best choice.

Threshold: Text, line art, high-contrast graphics where detail matters more than tonal range.
Ordered (Bayer): Real-time rendering where you want a deliberate retro/halftone aesthetic. Fast, parallelizable, GPU-friendly.
Floyd-Steinberg: Static image processing where quality matters most. Best for photographs and images with smooth gradients.
Blue noise: Real-time rendering where you want natural-looking results without visible patterns. The best general-purpose choice for modern applications.
Spherical/world-space: 3D rendering where temporal stability is critical. Essential for 1-bit 3D rendering like Obra Dinn.

Dithering is one of those topics where the simplicity of the concept — 'arrange dots to look like shading' — belies the depth of the implementation choices. Each algorithm represents a different trade-off between quality, speed, and visual character. And in an era where e-ink displays, low-bandwidth connections, and retro aesthetics are all relevant, understanding these trade-offs is more practical than ever.