Snapshot Diffing

Sometimes we can't intercept changes as they happen. Perhaps we're working with a library that creates its own objects. Perhaps we need to track changes across complex operations that modify many things at once. In these cases, we take a different approach: capture the state before, capture it after, and compute what changed.

The Approach

Snapshot diffing: capture state before and after, then compute what changed.

The algorithm:

  1. Take a snapshot before user actions
  2. Let the user do whatever they want
  3. Take a snapshot after
  4. Compute the diff between snapshots
  5. Convert the diff to a ChangeSet

Chess State Snapshots

Let's represent a chess game state:

    initialState = {
        pieces: {
            "e1": { type: "king", color: "white", hasMoved: false },
            "d1": { type: "queen", color: "white" },
            "a1": { type: "rook", color: "white", hasMoved: false },
            "h1": { type: "rook", color: "white", hasMoved: false },
            "e8": { type: "king", color: "black", hasMoved: false },
            // ... more pieces
        },
        turn: "white",
        moveCount: 0,
        capturedPieces: { white: [], black: [] }
    }

After the user moves the king from e1 to e2:

    afterMove = {
        pieces: {
            // "e1" key is gone!
            "e2": { type: "king", color: "white", hasMoved: true },
            "d1": { type: "queen", color: "white" },
            "a1": { type: "rook", color: "white", hasMoved: false },
            "h1": { type: "rook", color: "white", hasMoved: false },
            "e8": { type: "king", color: "black", hasMoved: false },
        },
        turn: "black",     // changed
        moveCount: 1,      // changed
        capturedPieces: { white: [], black: [] }
    }

The Diff Algorithm

erlang
    function diff(before, after, path = ""):
        changes = []
        
        // Handle primitives
        if isPrimitive(before) or isPrimitive(after):
            if before != after:
                changes.push({
                    type: "Set",
                    path: path,
                    oldValue: before,
                    newValue: after
                })
            return changes
        
        // Collect all keys from both objects
        allKeys = union(keys(before), keys(after))
        
        for key in allKeys:
            childPath = path ? path + "." + key : key
            inBefore = hasField(before, key)
            inAfter = hasField(after, key)
            
            if inBefore and not inAfter:
                // Key was deleted
                changes.push({
                    type: "Delete",
                    path: childPath,
                    oldValue: get(before, key)
                })
            else if not inBefore and inAfter:
                // Key was added
                changes.push({
                    type: "Create",
                    path: childPath,
                    newValue: get(after, key)
                })
            else:
                // Key exists in both - recurse
                childChanges = diff(
                    get(before, key), 
                    get(after, key), 
                    childPath
                )
                changes = concat(changes, childChanges)
        
        return changes

Diffing the Chess Move

sql
    changes = diff(initialState, afterMove)
    
    // Result:
    [
        { type: "Delete", path: "pieces.e1", 
          oldValue: { type: "king", color: "white", hasMoved: false } },
        
        { type: "Create", path: "pieces.e2", 
          newValue: { type: "king", color: "white", hasMoved: true } },
        
        { type: "Set", path: "turn", 
          oldValue: "white", newValue: "black" },
        
        { type: "Set", path: "moveCount", 
          oldValue: 0, newValue: 1 }
    ]
Visualizing the diff: deleted fields (red), created fields (green), and changed values (yellow).

From Raw Diff to Semantic Operations

The raw diff is low-level. We want semantic operations:

javascript
    function diffToChangeSet(changes, metadata):
        return {
            timestamp: now(),
            metadata: metadata,
            changes: changes.map(interpretChange)
        }
    
    function interpretChange(change):
        // Detect piece movement pattern
        if change.type == "Delete" and startsWith(change.path, "pieces."):
            square = extractSquare(change.path)
            return {
                type: "PieceRemoved",
                square: square,
                piece: change.oldValue
            }
        
        if change.type == "Create" and startsWith(change.path, "pieces."):
            square = extractSquare(change.path)
            return {
                type: "PiecePlaced", 
                square: square,
                piece: change.newValue
            }
        
        // Default: use raw change
        return change
    
    changeSet = diffToChangeSet(changes, "White king e1 to e2")

Detecting Moves

We can be even smarter---detecting that a delete + create is actually a move:

python
    function interpretChangesAsMoves(changes):
        pieceRemovals = filter(changes, c => c.type == "Delete" 
                                         and startsWith(c.path, "pieces."))
        piecePlacements = filter(changes, c => c.type == "Create"
                                          and startsWith(c.path, "pieces."))
        
        moves = []
        otherChanges = []
        
        for removal in pieceRemovals:
            // Find matching placement (same piece type and color)
            removedPiece = removal.oldValue
            matching = find(piecePlacements, p => 
                p.newValue.type == removedPiece.type and
                p.newValue.color == removedPiece.color)
            
            if matching:
                moves.push({
                    type: "Move",
                    piece: removedPiece.type,
                    color: removedPiece.color,
                    from: extractSquare(removal.path),
                    to: extractSquare(matching.path)
                })
                piecePlacements = remove(piecePlacements, matching)
            else:
                otherChanges.push(removal)
        
        return concat(moves, otherChanges, piecePlacements)

Result:

sql
    [
        { type: "Move", piece: "king", color: "white", 
          from: "e1", to: "e2" },
        { type: "Set", path: "turn", 
          oldValue: "white", newValue: "black" },
        { type: "Set", path: "moveCount", 
          oldValue: 0, newValue: 1 }
    ]

Proxies vs Diffing

When should you use each approach?

\textwidth
Proxy TrackingSnapshot Diffing
TimingChanges captured immediatelyChanges computed after the fact
OverheadPer-operation costPer-snapshot cost
ControlNeed to wrap objectsWorks on any data
GranularitySees every intermediate stepSees only final result
MemoryMinimalMust store snapshots

Use proxies when:

  • You control the object creation
  • You need immediate notification
  • Intermediate states matter

Use diffing when:

  • Working with external libraries
  • Only final results matter
  • You're comparing across time (like Git)

Optimizing Diffs

For large objects, full diffing can be expensive. Common optimizations:

Structural sharing: If a subtree hasn't changed, don't recurse into it.

erlang
    if before.pieces === after.pieces:
        // Same reference means no changes in pieces
        // Skip the entire subtree

Checksums: Compute a hash of subtrees; compare hashes first.

erlang
    if hash(before.board) == hash(after.board):
        // Board unchanged, skip detailed diff

Incremental diffing: Track which parts might have changed.

erlang
    // Only diff the parts that were touched
    diff(before.pieces, after.pieces)
    // Skip turn, moveCount if we know they didn't change

We've seen two ways to detect changes: proxies that intercept them live, and diffing that computes them afterward. But some concerns don't fit neatly into either model---they cut across the entire system. The next chapter explores aspects: a way to weave behavior throughout a codebase.