μpb Arena Fusion and References

μpb Arena Fusion and References

μpb generally follows a thread-compatibility model where only operations on a const pointer may occur concurrently from multiple threads; non-const operations must not race with each other or operations on const pointers.

Fusion, via upb_Arena_Fuse, binds the lifetime of two arenas together, such that none are freed until all of the transitively fused arenas reach refcount 0. This is useful to avoid dangling pointers when a message in one arena references a message in another - such as when a child message is constructed in isolation then set onto a parent message - by fusing the parent’s arena with the child’s, the child’s lifetime is tied to the parent’s without needing to copy it.

Both modifying the refcount and fusing are thread safe; if you want to conceptually perform allocations concurrently from multiple threads with the same arena lifetime, one way to achieve that would be to share a const upb_Arena* parent plus a dedicated arena per thread, and fuse the thread-specific arena with the parent arena. By contrast, reference counting cannot help to achieve concurrent allocations on multiple threads, it only helps in synchronizing lifetimes from multiple things observing the same arena (which may be a non-const pointer for multiple writers in a single threaded context, or a const pointer for multiple readers in a multithreaded context).

To be usable everywhere, fusion has a lock-free implementation, documented below.

Data Structure

Each arena has three pointer sized members used to track the relationship between arenas. Together, they implement a hybrid disjoint set forest and doubly-linked list.

// Tagged pointer - tracked as black arrows in diagrams
UPB_ATOMIC(uintptr_t) parent_or_count;
// Linked list - tracked as red arrows in diagrams
UPB_ATOMIC(struct upb_ArenaInternal*) next;
// Linked list - previous tracked as blue arrows in diagrams, tail as dashed
UPB_ATOMIC(uintptr_t) previous_or_tail;

The disjoint set is used to check whether two arenas are already fused and to provide agreement on a single reference count across all fused arenas. The linked list is used to implement freeing arenas once the refcount reaches 0, and to implement upb_Arena_SpaceAllocated.

Two arenas are considered fused when the disjoint set says they are; the linked list is guaranteed to converge before the refcount reaches 0, but may only track a subset of fused arenas while racing with fuse calls.

Finding the root

Each set of fused arenas is uniquely identified by the root node of its tree. To find the root for a given arena, traverse the parent_or_count pointer until you reach a node with a count instead of a parent pointer - that’s the root. To avoid expensive lookups if frequently repeated, this data structure uses path splitting to halve the distance from the root for every node traversed. Repeatedly finding the root for a series of nodes that are part of the same fused set will converge to O(1) quickly. Let’s find the root of D, starting with:

digraph {
C [label="C (next)"]
D [label="D (current)"]
A -> B [dir=back]
B -> C [dir=back]
C -> D [dir=back]
}

We traverse our parents; each time we pass one we link it to its grandparent, so that future lookups are faster.

digraph {
B [label="B (next)"]
C [label="C (current)"]
A -> B [dir=back]
B -> C [dir=back]
B -> D [dir=back]
}

Then:

digraph {
A [label="A (next)"]
B [label="B (current)"]
A -> B [dir=back]
A -> C [dir=back]
B -> D [dir=back]
}

D and C’s paths got shorter; if we query D’s root again, all subsequent lookups will only require one step.

Fusing

Fusing starts by identifying the roots of each arena to fuse - if they already share a root, there’s nothing to be done. We’ll work through the example of fusing C and D - which is the same as fusing A and B, as they’re the roots.

digraph {
A [label="A (refs=2)"]
B [label="B (refs=3)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> C [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
}

A and B are roots, so they store a refcount in their parent_or_count and the start of the linked list in their next pointers; C and D are not, so they store pointers to their parent node in parent_or_count. C and D store NULL in next as they happen to be the last nodes in their set.

Pass refcount

As soon as the parent changes, all future refcount operations will switch to the new root. To avoid decrementing to 0 while the old root still has active references, we add the lower-addressed node’s refcount to the new root.

digraph {
A [label="A (refs=5)"]
B [label="B (refs=3)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> C [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
}

The total transferred refcount is tracked throughout the operation so it can be fixed up at the end if retries resulted in over-transfer.

Union

Now the arena with a lower address becomes the new root (A in this case), by compare-and-exchange the higher address arena’s parent_or_count pointer.

digraph {
A [label="A (refs=5)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> C [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
A -> B [dir=back]
}

This atomically eliminates B’s refcount and makes it one of A’s children. Any change to B’s refcount (or fusing it to a different arena with a lower address) will make the swap fail, and the process starts again from the beginning.

Refcount fixups

If B’s refcount changed while performing the union, we increase or decrease A’s refcount by the difference between the initial refcount we added and B’s final refcount, observed as part of the compare-and-exchange of B’s parent_or_count.

Fuse linked lists

To fuse the linked lists, we need to make A’s tail point to B. The root node is always the start of the linked list, so that the list can’t contain cycles. From A’s tail pointer, we traverse until we hit the end (C in this case), and CAS its next to B.

digraph {
A [label="A (refs=5)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> C [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
A -> B [dir=back]
C -> B [weight=0.001, color="red"]
}

With this, B is now reachable from the root and the final free operation will be able to see it, traversing A->C->B->D.

Update tail pointer

If we stopped at the previous step, it’s easy to see how fusing many arenas could result in O(n²) behavior, as we’d spend all of our time traversing the linked list from the root to find the tail. To avoid this, we update A’s tail pointer to point to B’s tail - so the next fuse operation will not have to traverse C or B.

digraph {
A [label="A (refs=5)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> D [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> D [style=dashed]
A -> B [dir=back]
C -> B [weight=0.001, color="red"]
}

Update back links

To make it possible to always traverse the list in both directions for upb_Arena_SpaceAllocated, we need to make the matching back link for our doubly-linked list. Since we linked C to B, we need to link B to C via B’s previous_or_tail pointer.

digraph {
A [label="A (refs=5)"]
A -> C [dir=back]
A -> C [weight=0.001, color="red"]
C -> A [weight=0.001, color="blue"]
A -> D [style=dashed]
B -> D [dir=back]
B -> D [weight=0.001, color="red"]
D -> B [weight=0.001, color="blue"]
B -> C [weight=0.001, color="blue"]
A -> B [dir=back]
C -> B [weight=0.001, color="red"]
}

One-Way References

Fusion creates a bi-directional lifetime dependency between arenas: fused arenas will have the same lifetime. In some cases, only a one-way dependency is needed. If messages in arena A contain pointers to messages in arena B but not the reverse, then arena B only needs to live at least as long as arena A.

upb_Arena_RefArena(A, B) establishes such a one-way dependency by having arena A increment B’s refcount. When arena A is freed, it releases its reference on B. This is implemented by allocating a special block in A that holds a pointer to B; this block is processed during upb_Arena_Free(A). Care is taken to ensure that these blocks are processed before the memory blocks containing them are freed from the underlying allocator.

Unlike fusion, upb_Arena_RefArena(A, B) is not thread-safe when called concurrently with other operations on arena A, but it is thread-safe on B.

It is an error to create cycles of references (e.g., upb_Arena_RefArena(A, B); upb_Arena_RefArena(B, A)), or to create a reference between arenas that are fused. Since fusion creates a bi-directional dependency, fusing arenas A and B and calling upb_Arena_RefArena(B, A) would create a cycle A <-> B -> A. In debug builds, upb checks for such cycles, with the implementation documented below.

Counting allocated space

Traversing the linked list while the arenas are still live can be tricky, as we can’t necessarily reach our own node via the linked list from the root. Instead, we scan the linked list in both directions starting from the caller-specified node. This makes space counting weakly consistent - it’s possible to see that A and B are fused, but not see A’s space reflected in SpaceAllocated(B) if the fuse operation that joined them is still in progress. But counting allocated space is always consistent with itself and any fully complete fuses - nodes are only appended or prepended to the list, so starting the traversal at the same point each time guarantees that future scans see a superset of previous ones. Referenced arenas are not included in the count of allocated space.

Detecting reference cycles

To help prevent memory leaks from uncollectable arenas, upb checks for cycles in debug builds when creating a reference or fusing arenas. A cycle can be formed purely by references (eg. A->B->A) or by a combination of references and fusions (eg. Fuse(A, B), then RefArena(B, A) creates the cycle A<->B->A). If such cycles were permitted, the arenas involved could never be freed.

After performing a fusion or adding a reference, the cycle detector runs. This check is performed after the operation because cycle detection cannot be performed atomically; if the check was done prior to the fusion or reference, we could miss a case where two concurrent operations resulted in a cycle. Both would check for a possible cycle, find none, and then proceed.

Consider arenas with references A->B->C.

digraph {
  rankdir=LR;
  A; B; C;
  A -> B [label="ref", color=blue];
  B -> C [label="ref", color=blue];
}

If we attempt RefArena(C, A), the call will add ref C->A, then check that C is not reachable from A.

digraph {
  rankdir=LR;
  A -> B [label="ref", color=blue];
  B -> C [label="ref", color=blue];
  C -> A [label="ref", color=blue];
}

The traversal will reveal A -> B -> C and trigger an assertion failure.

If instead attempt Fuse(A, C), fusion will occur:

digraph {
  rankdir=LR;
  node [shape=box, style=rounded];
  A -> B [label="ref", color=blue];
  B -> C [label="ref", color=blue];
  C -> A [label="fuse", dir=both, color=red];
}

The traversal will find C <-> A -> B -> C, and trigger an assertion failure.

Cycle detection algorithm

Cycles are detected with a non-memoizing recursive depth-first search (DFS). A path can traverse fusions bi-directionally and references uni-directionally to find a cycle that contains at least one unidirectional edge. While this is not asymptotically optimal as it may traverse the same set of nodes numerous times, it doesn’t require allocating memory, making it less intrusive as a debug-only check. It also may result in infinite recursion if a cycle is created on one thread while another thread is performing cycle checks, but that would have caused an assertion failure anyway.

Fast check for fusion

If a directional ref was added and from and to are already fused (upb_Arena_IsFused(from, to) returns true), they are mutually reachable, so a path containing a directed edge exists and the assertion fails.

Traversing fusion group members

To check all references originating from from’s fusion group, we must visit every arena fused with from. Because fusion operations can race with this check, we cannot rely on traversing from the fusion root, which might change. Instead, like upb_Arena_SpaceAllocated, the algorithm finds members of from’s fusion group by first traversing backwards from from using previous_or_tail, then iterates forwards.

Traverse fusion group and DFS on references

From the head of the list segment, we traverse forwards using next to visit all nodes in from’s fusion group. For each arena X in this group, we iterate through all of its outgoing references (the list of arenas Y such that RefArena(X, Y) was called). For each such reference X -> Y:

If Y == to, then to is directly referenced, so a path exists. We return true and eventually trigger an assertion failure.
If Y != to, we recurse to continue the depth-first search from Y, finding members of its fusion group and inspecting their outgoing references. If this recursive call returns true, then to is reachable via Y, and we return true to trigger an assertion failure.

If we explore all arenas in the fusion group and all their transitive references without finding a path to to, no path exists, and we return false.

PGXN

PostgreSQL Extension Network

Contents