Collaborative Editing Is Harder Than You Think

Google Docs handles real-time collaborative editing for millions of users. It looks effortless — you type, your cursor moves, your collaborator's edits appear instantly. This simplicity is deceptive. Under the hood, collaborative editing is one of the hardest problems in distributed systems, and the two main approaches to solving it (Operational Transformation and CRDTs) each have trade-offs that aren't obvious until you try to build something with them.

The pitch for CRDTs (Conflict-free Replicated Data Types) is seductive: data structures that automatically merge without conflicts, no central server needed, eventual consistency guaranteed by mathematics. But the reality — as several teams have discovered after choosing CRDTs for their collaborative editors — is messier. The theory is beautiful. The engineering is brutal.

The Problem: Concurrent Edits

The fundamental challenge: two users editing the same document simultaneously, with network delay between them. User A inserts 'Hello' at position 5. User B, who hasn't seen A's edit yet, deletes the character at position 5. What should the final document look like?

This is ambiguous. Should B's delete apply to the character that was at position 5 before A's insert, or after? If before, the delete removes the original character and 'Hello' appears. If after, the delete removes the 'H' from 'Hello.' Both interpretations are reasonable. The system has to pick one and ensure all clients converge on the same result — divergence means two users see different documents, which is the one unforgivable failure mode.

The convergence problem:
Initial document: "The quick brown fox"
^ position 10
User A (at time T):  Insert "very " at position 10
User B (at time T):  Delete character at position 10
A sees locally: "The quick very brown fox"
B sees locally: "The quick rown fox"  (deleted 'b')
Now A receives B's operation: delete at position 10
But A already inserted 5 chars at position 10...
Should B's delete now apply at position 10? (deletes 'v' from 'very')
Or at position 15? (deletes 'b' from 'brown' — the original target)
Both A and B must arrive at the same document.
This is the problem OT and CRDTs solve differently.

Operational Transformation (OT)

OT, the older approach (1989), solves this by transforming operations against each other. When A receives B's 'delete at position 10,' A's system checks what operations A has applied that B hasn't seen yet. It transforms B's operation to account for those changes: since A inserted 5 characters at position 10, B's delete should now apply at position 15 (the original target character shifted right).

This works, and Google Docs uses it. The problem is the transformation functions. For a text editor with insert and delete operations, you need to define how every pair of operations transforms against each other. With two operation types, that's four transformation cases. Add formatting, tables, images, lists, and comments, and the number of cases explodes combinatorially. Each case must be correct, and subtle bugs cause document divergence that's almost impossible to reproduce and debug.

OT also traditionally requires a central server to establish a total ordering of operations. Without a server, concurrent operations can be transformed in different orders by different clients, producing different results. Google can afford a central server for Docs. A peer-to-peer application can't use OT easily.

CRDTs: The Theoretical Promise

CRDTs take a fundamentally different approach. Instead of transforming operations, they use data structures where all possible merge orders produce the same result. This is a mathematical guarantee — if the data structure is a valid CRDT, convergence is automatic. No transformation functions, no central server, no ordering dependency.

For text editing, the key CRDT variants are RGA (Replicated Growable Array) and similar sequence CRDTs. Instead of tracking positions (which shift as edits occur), they assign each character a unique, globally-ordered identifier. Inserts create new IDs between existing ones. Deletes mark an ID as tombstoned (not removed — more on this later).

CRDT sequence representation:
Document: "cat"
Internal representation (simplified):
ID: (A,1) → 'c'   (user A, logical clock 1)
ID: (A,2) → 'a'   (user A, logical clock 2)
ID: (A,3) → 't'   (user A, logical clock 3)
User B inserts 'h' between 'c' and 'a':
ID: (B,1) → 'h'   position: between (A,1) and (A,2)
Document: "chat"
User A (concurrently) inserts 'r' between 'c' and 'a':
ID: (A,4) → 'r'   position: between (A,1) and (A,2)
Both IDs go between the same characters.
The CRDT uses a deterministic tie-breaking rule
(e.g., higher user ID wins) to order them.
Final document: "chart" or "chrat"
(deterministic — all clients get the same result)

CRDTs: The Practical Reality

The CRDT promise — conflict-free, decentralized, mathematically guaranteed convergence — is real. But the engineering challenges are significant.

Tombstones accumulate. When you delete a character in a CRDT, it can't be removed from the data structure — other replicas might not have seen it yet and need the ID to correctly position their own edits. So deleted characters are marked as tombstones: invisible but still present. A document that's been heavily edited accumulates thousands of tombstones. A 1000-character document might have 50,000 tombstones from editing history. This bloats memory and slows operations.

Metadata overhead is massive. Each character needs a unique ID (user identifier + logical timestamp), pointers to neighboring IDs, and tombstone flags. The metadata per character can be 50-100 bytes. For a 100KB document, the CRDT representation might be 5-10MB. This matters for sync — sending the full CRDT state over the network is expensive.

Performance degrades with history. Operations on a sequence CRDT aren't O(1) — inserting a character requires finding the correct position in the ID-ordered sequence, which depends on the total number of IDs (including tombstones). For documents with long editing histories, this gets measurably slow.

Intention preservation is hard. CRDTs guarantee convergence — all replicas reach the same state. But do they reach the right state? When two users concurrently edit the same word, the CRDT interleaves their characters deterministically. The result is convergent but might be nonsensical. OT can be designed to keep one user's edit intact and apply the other's around it. CRDTs merge mechanically.

Yjs and the Practical Middle Ground

Yjs is the most popular CRDT library for JavaScript and powers collaborative features in many web applications. It's well-engineered and addresses many of the theoretical CRDT problems with practical optimizations — compact binary encoding reduces metadata overhead, and garbage collection of tombstones works when all clients are online.

But Yjs isn't magic. Teams that adopt it discover that a CRDT library solves the convergence problem but not the collaboration problem. You still need: a signaling server for peer discovery, a persistence layer for offline changes, conflict resolution for high-level operations (what happens when two users restructure the same section?), presence tracking (cursors, selections), undo/redo that respects other users' edits, and permission management.

Some teams, after building a collaborative editor on Yjs, have concluded that the CRDT approach adds complexity they don't need. If your application has a central server (and most do), OT or even simpler approaches (last-write-wins with conflict detection) might be more practical. CRDTs shine for truly peer-to-peer scenarios — offline-first applications, local-first software, and systems where no central authority can be assumed.

What Most Applications Should Do

If you're adding collaborative editing to your application, here's the pragmatic decision framework.

If you have a central server and your documents are small: Use OT. Google's approach works. Libraries like ShareDB implement OT for Node.js. The central server simplifies everything — ordering, persistence, conflict resolution, permissions.
If you need offline support or peer-to-peer: Use CRDTs (Yjs or Automerge). CRDTs are the only approach that handles true offline editing and peer-to-peer sync correctly. Accept the metadata overhead and tombstone accumulation as the cost of decentralization.
If your collaborators rarely edit the same section simultaneously: You might not need either. Simple locking (only one editor per section) or last-write-wins with good conflict UI covers many real-world collaboration patterns without the complexity of OT or CRDTs.
If you're building a Google Docs competitor: You need a team of distributed systems engineers and years of development. This is not a weekend project. The surface simplicity of collaborative editing hides extraordinary complexity.

The Honest Assessment

Collaborative editing is one of those problems where the difficulty is non-obvious. The happy path — two users making non-overlapping edits — works with almost any approach. The hard cases — concurrent edits to the same region, offline editing with later sync, complex document structures (tables, nested lists, embedded objects) — require deep understanding of the trade-offs between OT and CRDTs.

Neither approach is strictly better. OT is simpler with a server, harder without one. CRDTs work without a server but carry metadata overhead and tombstone accumulation. Both require careful engineering beyond the core algorithm — and that engineering (presence, persistence, permissions, undo) is often harder than the convergence problem itself.

The industry is slowly converging on a pragmatic hybrid: CRDTs for the data model (automatic merge guarantees), with a server for coordination (presence, permissions, garbage collection). This gives you the best of both worlds — mathematical convergence plus practical infrastructure. But it also gives you the complexity of both worlds, which is why collaborative editing remains a hard problem after 35 years of research.