μpb Arena Fusion

μpb generally follows a thread-compatibility model where only operations on a const pointer may occur concurrently from multiple threads; non-const operations must not race with each other or operations on const pointers.

Fusion, via upb_Arena_Fuse, binds the lifetime of two arenas together, such that none are freed until all of the transitively fused arenas reach refcount 0. This is useful to avoid dangling pointers when a message in one arena references a message in another - such as when a child message is constructed in isolation then set onto a parent message - by fusing the parent‘s arena with the child’s, the child‘s lifetime is tied to the parent’s without needing to copy it.

Both modifying the refcount and fusing are thread safe; if you want to conceptually perform allocations concurrently from multiple threads with the same arena lifetime, one way to achieve that would be to share a const upb_Arena* parent plus a dedicated arena per thread, and fuse the thread-specific arena with the parent arena. By contrast, reference counting cannot help to achieve concurrent allocations on multiple threads, it only helps in synchronizing lifetimes from multiple things observing the same arena (which may be a non-const pointer for multiple writers in a single threaded context, or a const pointer for multiple readers in a multithreaded context).

To be usable everywhere, it has a lock-free implementation, documented below.

Data Structure

Each arena has three pointer sized members used to track the relationship between arenas. Together, they implement a hybrid disjoint set forest and doubly-linked list.

// Tagged pointer - tracked as black arrows in diagrams
UPB_ATOMIC(uintptr_t) parent_or_count;
// Linked list - tracked as red arrows in diagrams
UPB_ATOMIC(struct upb_ArenaInternal*) next;
// Linked list - previous tracked as blue arrows in diagrams, tail as dashed
UPB_ATOMIC(uintptr_t) previous_or_tail;

The disjoint set is used to check whether two arenas are already fused and to provide agreement on a single reference count across all fused arenas. The linked list is used to implement freeing arenas once the refcount reaches 0, and to implement upb_Arena_SpaceAllocated.

Two arenas are considered fused when the disjoint set says they are; the linked list is guaranteed to converge before the refcount reaches 0, but may only track a subset of fused arenas while racing with fuse calls.

Finding the root

Each set of fused arenas is uniquely identified by the root node of its tree. To find the root for a given arena, traverse the parent_or_count pointer until you reach a node with a count instead of a parent pointer - that‘s the root. To avoid expensive lookups if frequently repeated, this data structure uses path splitting to halve the distance from the root for every node traversed. Repeatedly finding the root for a series of nodes that are part of the same fused set will converge to O(1) quickly. Let’s find the root of D, starting with:

digraph {
C [label="C (next)"]
D [label="D (current)"]
A -> B [dir=back]
B -> C [dir=back]
C -> D [dir=back]
}

We traverse our parents; each time we pass one we link it to its grandparent, so that future lookups are faster.

digraph {
B [label="B (next)"]
C [label="C (current)"]
A -> B [dir=back]
B -> C [dir=back]
B -> D [dir=back]
}

Then:

digraph {
A [label="A (next)"]
B [label="B (current)"]
A -> B [dir=back]
A -> C [dir=back]
B -> D [dir=back]
}

D and C‘s paths got shorter; if we query D’s root again, all subsequent lookups will only require one step.

Fusing

Fusing starts by identifying the roots of each arena to fuse - if they already share a root, there‘s nothing to be done. We’ll work through the example of fusing C and D - which is the same as fusing A and B, as they're the roots.

digraph {
A [label="A (refs=2)"]
B [label="B (refs=3)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> C [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
}

A and B are roots, so they store a refcount in their parent_or_count and the start of the linked list in their next pointers; C and D are not, so they store pointers to their parent node in parent_or_count. C and D store NULL in next as they happen to be the last nodes in their set.

Pass refcount

As soon as the parent changes, all future refcount operations will switch to the new root. To avoid decrementing to 0 while the old root still has active references, we add the lower-addressed node's refcount to the new root.

digraph {
A [label="A (refs=5)"]
B [label="B (refs=3)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> C [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
}

The total transferred refcount is tracked throughout the operation so it can be fixed up at the end if retries resulted in over-transfer.

Union

Now the arena with a lower address becomes the new root (A in this case), by compare-and-exchange the higher address arena's parent_or_count pointer.

digraph {
A [label="A (refs=5)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> C [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
A -> B [dir=back]
}

This atomically eliminates B‘s refcount and makes it one of A’s children. Any change to B's refcount (or fusing it to a different arena with a lower address) will make the swap fail, and the process starts again from the beginning.

Refcount fixups

If B‘s refcount changed while performing the union, we increase or decrease A’s refcount by the difference between the initial refcount we added and B‘s final refcount, observed as part of the compare-and-exchange of B’s parent_or_count.

Fuse linked lists

To fuse the linked lists, we need to make A‘s tail point to B. The root node is always the start of the linked list, so that the list can’t contain cycles. From A's tail pointer, we traverse until we hit the end (C in this case), and CAS its next to B.

digraph {
A [label="A (refs=5)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> C [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
A -> B [dir=back]
C -> B [weight=0.001, color="red"]
}

With this, B is now reachable from the root and the final free operation will be able to see it, traversing A->C->B->D.

Update tail pointer

If we stopped at the previous step, it‘s easy to see how fusing many arenas could result in O(n^2) behavior, as we’d spend all of our time traversing the linked list from the root to find the tail. To avoid this, we update A‘s tail pointer to point to B’s tail - so the next fuse operation will not have to traverse C or B.

digraph {
A [label="A (refs=5)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> D [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
A -> B [dir=back]
C -> B [weight=0.001, color="red"]
}

Update back links

To make it possible to always traverse the list in both directions for upb_Arena_SpaceAllocated, we need to make the matching back link for our doubly-linked list. Since we linked C to B, we need to link B to C via B's previous_or_tail pointer.

digraph {
A [label="A (refs=5)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> D [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> C [weight=0.001, color="blue"]
A -> B [dir=back]
C -> B [weight=0.001, color="red"]
}

Counting allocated space

Traversing the linked list while the arenas are still live can be tricky, as we can‘t necessarily reach our own node via the linked list from the root. Instead, we scan the linked list in both directions starting from the caller-specified node. This makes space counting weakly consistent - it’s possible to see that A and B are fused, but not see A's space reflected in SpaceAllocated(B) if the fuse operation that joined them is still in progress. But counting allocated space is always consistent with itself and any fully complete fuses - nodes are only appended or prepended to the list, so starting the traversal at the same point each time guarantees that future scans see a superset of previous ones.