| Data Structures |
| ############### |
| |
| Zephyr provides a library of common general purpose data structures |
| used within the kernel, but useful by application code in general. |
| These include list and balanced tree structures for storing ordered |
| data, and a ring buffer for managing "byte stream" data in a clean |
| way. |
| |
| Note that in general, the collections are implemented as "intrusive" |
| data structures. The "node" data is the only struct used by the |
| library code, and it does not store a pointer or other metadata to |
| indicate what user data is "owned" by that node. Instead, the |
| expectation is that the node will be itself embedded within a |
| user-defined struct. Macros are provided to retrieve a user struct |
| address from the embedded node pointer in a clean way. The purpose |
| behind this design is to allow the collections to be used in contexts |
| where dynamic allocation is disallowed (i.e. there is no need to |
| allocate node objects because the memory is provided by the user). |
| |
| Note also that these libraries are uniformly unsynchronized; access to |
| them is not threadsafe by default. These are data structures, not |
| synchronization primitives. The expectation is that any locking |
| needed will be provided by the user. |
| |
| Single-linked List |
| ================== |
| |
| Zephyr provides a ``sys_slist_t`` type for storing simple |
| singly-linked list data (i.e. data where each list element stores a |
| pointer to the next element, but not the previous one). This supports |
| constant-time access to the first (head) and last (tail) elements of |
| the list, insertion before the head and after the tail of the list and |
| constant time removal of the head. Removal of subsequent nodes |
| requires access to the "previous" pointer and thus can only be |
| performed in linear time by searching the list. |
| |
| The ``sys_slist_t`` struct may be instantiated by the user in any |
| accessible memory. It should be initialized with either |
| ``sys_slist_init()`` or by static assignment from SYS_SLIST_STATIC_INIT |
| before use. Its interior fields are opaque and should not be accessed |
| by user code. |
| |
| The end nodes of a list may be retrieved with |
| ``sys_slist_peek_head()`` and ``sys_slist_peek_tail()``, which will |
| return NULL if the list is empty, otherwise a pointer to a |
| ``sys_snode_t`` struct. |
| |
| The ``sys_snode_t`` struct represents the data to be inserted. In |
| general, it is expected to be allocated/controlled by the user, |
| usually embedded within a struct which is to be added to the list. |
| The container struct pointer may be retrieved from a list node using |
| ``SYS_SLIST_CONTAINER()``, passing it the struct name of the |
| containing struct and the field name of the node. Internally, the |
| ``sys_snode_t`` struct contains only a next pointer, which may be |
| accessed with ``sys_slist_peek_next()``. |
| |
| Lists may be modified by adding a single node at the head or tail with |
| ``sys_slist_prepend()`` and ``sys_slist_append()``. They may also |
| have a node added to an interior point with ``sys_slist_insert()``, |
| which inserts a new node after an existing one. Similarly |
| ``sys_slist_remove()`` will remove a node given a pointer to its |
| predecessor. These operations are all constant time. |
| |
| Convenience routines exist for more complicated modifications to a |
| list. ``sys_slist_merge_slist()`` will append an entire list to an |
| existing one. ``sys_slist_append_list()`` will append a bounded |
| subset of an existing list in constant time. And |
| ``sys_slist_find_and_remove()`` will search a list (in linear time) |
| for a given node and remove it if present. |
| |
| Finally the slist implementation provides a set of "for each" macros |
| that allows for iterating over a list in a natural way without needing |
| to manually traverse the next pointers. ``SYS_SLIST_FOR_EACH_NODE()`` |
| will enumerate every node in a list given a local variable to store |
| the node pointer. ``SYS_SLIST_FOR_EACH_NODE_SAFE()`` behaves |
| similarly, but has a more complicated implementation that requires an |
| extra scratch variable for storage and allows the user to delete the |
| iterated node during the iteration. Each of those macros also exists |
| in a "container" variant (``SYS_SLIST_FOR_EACH_CONTAINER()`` and |
| ``SYS_SLIST_FOR_EACH_CONTAINER_SAFE()``) which assigns a local |
| variable of a type that matches the user's container struct and not |
| the node struct, performing the required offsets internally. And |
| ``SYS_SLIST_ITERATE_FROM_NODE()`` exists to allow for enumerating a |
| node and all its successors only, without inspecting the earlier part |
| of the list. |
| |
| Slist Internals |
| --------------- |
| |
| The slist code is designed to be minimal and conventional. |
| Internally, a ``sys_slist_t`` struct is nothing more than a pair of |
| "head" and "tail" pointer fields. And a ``sys_snode_t`` stores only a |
| single "next" pointer. |
| |
| .. figure:: slist.png |
| :align: center |
| :alt: slist example |
| :figclass: align-center |
| |
| An slist containing three elements. |
| |
| .. figure:: slist-empty.png |
| :align: center |
| :alt: empty slist example |
| :figclass: align-center |
| |
| An empty slist |
| |
| The specific implementation of the list code, however, is done with an |
| internal "Z_GENLIST" template API which allows for extracting those |
| fields from arbitrary structures and emits an arbitrarily named set of |
| functions. This allows for implementing more complicated |
| single-linked list variants using the same basic primitives. The |
| genlist implementor is responsible for a custom implementation of the |
| primitive operations only: an "init" step for each struct, and a "get" |
| and "set" primitives for each of head, tail and next pointers on their |
| relevant structs. These inline functions are passed as parameters to |
| the genlist macro expansion. |
| |
| Only one such variant, sflist, exists in Zephyr at the moment. |
| |
| Flagged List |
| ------------ |
| |
| The ``sys_sflist_t`` is implemented using the described genlist |
| template API. With the exception of symbol naming ("sflist" instead |
| of "sflist"), it operates in all ways identically to the slist API. |
| |
| It includes the ability to associate exactly two bits of user defined |
| "flags" with each list node. These can be accessed and modified with |
| ``sys_sflist_flags_get()`` and ``sys_sflist_flags_get()``. |
| Internally, the flags are stored unioned with the bottom bits of the |
| next pointer and incur no SRAM storage overhead when compared with the |
| simpler slist code. |
| |
| Double-linked List |
| ================== |
| |
| Similar to the single-linked list in many respects, Zephyr includes a |
| double-linked implementation. This provides the same algorithmic |
| behavior for all the existing slist operations, but also allows for |
| constant-time removal and insertion (at all points: before or after |
| the head, tail or any internal node). To do this, the list stores two |
| pointers per node, and thus has somewhat higher runtime code and |
| memory space needs. |
| |
| A ``sys_dlist_t`` struct may be instantiated by the user in any |
| accessible memory. It must be initialized with ``sys_dlist_init()`` |
| or ``SYS_DLIST_STATIC_INIT()`` before use. The ``sys_dnode_t`` struct |
| is expected to be provided by the user for any nodes addded to the |
| list (typically embedded within the struct to be tracked, as described |
| above). It must be initialized in zeroed/bss memory or with |
| ``sys_dnode_init()`` before use. |
| |
| Primitive operations may retrieve the head/tail of a list and the |
| next/prev pointers of a node with ``sys_dlist_peek_head()``, |
| ``sys_dlist_peek_tail()``, ``sys_dlist_peek_next()`` and |
| ``sys_dlist_peek_prev()``. These can all return NULL where |
| appropriate (i.e. for empty lists, or nodes at the endpoints of the |
| list). |
| |
| A dlist can be modified in constant time by removing a node with |
| ``sys_dlist_remove()``, by adding a node to the head or tail of a list |
| with ``sys_dlist_prepend()`` and ``sys_dlist_append()``, or by |
| inserting a node before an existing node with ``sys_dlist_insert()``. |
| |
| As for slist, each node in a dlist can be processed in a natural code |
| block style using ``SYS_DLIST_FOR_EACH_NODE()``. This macro also |
| exists in a "FROM_NODE" form which allows for iterating from a known |
| starting point, a "SAFE" variant that allows for removing the node |
| being inspected within the code block, a "CONTAINER" style that |
| provides the pointer to a containing struct instead of the raw node, |
| and a "CONTAINER_SAFE" variant that provides both properties. |
| |
| Convenience utilities provided by dlist include |
| ``sys_dlist_insert_at()``, which inserts a node that linearly searches |
| through a list to find the right insertion point, which is provided by |
| the user as a C callback function pointer, and |
| ``sys_dlist_is_linked()``, which will affirmatively return whether or |
| not a node is currently linked into a dlist or not (via an |
| implementation that has zero overhead vs. the normal list processing). |
| |
| Dlist Internals |
| --------------- |
| |
| Internally, the dlist implementation is minimal: the ``sys_dlist_t`` |
| struct contains "head" and "tail" pointer fields, the ``sys_dnode_t`` |
| contains "prev" and "next" pointers, and no other data is stored. But |
| in practice the two structs are internally identical, and the list |
| struct is inserted as a node into the list itself. This allows for a |
| very clean symmetry of operations: |
| |
| * An empty list has backpointers to itself in the list struct, which |
| can be trivially detected. |
| |
| * The head and tail of the list can be detected by comparing the |
| prev/next pointers of a node vs. the list struct address. |
| |
| * An insertion or deletion never needs to check for the special case |
| of inserting at the head or tail. There are never any NULL pointers |
| within the list to be avoided. Exactly the same operations are run, |
| without tests or branches, for all list modification primitives. |
| |
| Effectively, a dlist of N nodes can be thought of as a "ring" of "N+1" |
| nodes, where one node represents the list tracking struct. |
| |
| .. figure:: dlist.png |
| :align: center |
| :alt: dlist example |
| :figclass: align-center |
| |
| A dlist containing three elements. Note that the list struct |
| appears as a fourth "element" in the list. |
| |
| .. figure:: dlist-single.png |
| :align: center |
| :alt: single-element dlist example |
| :figclass: align-center |
| |
| An dlist containing just one element. |
| |
| .. figure:: dlist-empty.png |
| :align: center |
| :alt: dlist example |
| :figclass: align-center |
| |
| An empty dlist. |
| |
| Balanced Red/Black Tree |
| ======================= |
| |
| For circumstances where sorted containers may become large at runtime, |
| a list becomes problematic due to algorithmic costs of searching it. |
| For these situations, Zephyr provides a balanced tree implementation |
| which has runtimes on search and removal operations bounded at |
| O(log2(N)) for a tree of size N. This is implemented using a |
| conventional red/black tree as described by multiple academic sources. |
| |
| The ``struct rbtree`` tracking struct for a rbtree may be initialized |
| anywhere in user accessible memory. It should contain only zero bits |
| before first use. No specific initialization API is needed or |
| required. |
| |
| Unlike a list, where position is explicit, the ordering of nodes |
| within an rbtree must be provided as a predicate function by the user. |
| A function of type ``rb_lessthan_t()`` should be assigned to the |
| ``lessthan_fn`` field of the ``struct rbtree`` before any tree |
| operations are attempted. This function should, as its name suggests, |
| return a boolean True value if the first node argument is "less than" |
| the second in the ordering desired by the tree. Note that "equal" is |
| not allowed, nodes within a tree must have a single fixed order for |
| the algorithm to work correctly. |
| |
| As with the slist and dlist containers, nodes within an rbtree are |
| represented as a ``struct rbnode`` structure which exists in |
| user-managed memory, typically embedded within the the data structure |
| being tracked in the tree. Unlike the list code, the data within an |
| rbnode is entirely opaque. It is not possible for the user to extract |
| the binary tree topology and "manually" traverse the tree as it is for |
| a list. |
| |
| Nodes can be inserted into a tree with ``rb_insert()`` and removed |
| with ``rb_remove()``. Access to the "first" and "last" nodes within a |
| tree (in the sense of the order defined by the comparison function) is |
| provided by ``rb_get_min()`` and ``rb_get_max()``. There is also a |
| predicate, ``rb_contains()``, which returns a boolean True if the |
| provided node pointer exists as an element within the tree. As |
| described above, all of these routines are guaranteed to have at most |
| log time complexity in the size of the tree. |
| |
| There are two mechanisms provided for enumerating all elements in an |
| rbtree. The first, ``rb_walk()``, is a simple callback implementation |
| where the caller specifies a C function pointer and an untyped |
| argument to be passed to it, and the tree code calls that function for |
| each node in order. This has the advantage of a very simple |
| implementation, at the cost of a somewhat more cumbersome API for the |
| user (not unlike ISO C's ``bsearch()`` routine). It is a recursive |
| implementation, however, and is thus not always available in |
| environments that forbid the use of unbounded stack techniques like |
| recursion. |
| |
| There is also a ``RB_FOR_EACH()`` iterator provided, which, like the |
| similar APIs for the lists, works to iterate over a list in a more |
| natural way, using a nested code block instead of a callback. It is |
| also nonrecursive, though it requires log-sized space on the stack by |
| default (however, this can be configured to use a fixed/maximally size |
| buffer instead where needed to avoid the dynamic allocation). As with |
| the lists, this is also available in a ``RB_FOR_EACH_CONTAINER()`` |
| variant which enumerates using a pointer to a container field and not |
| the raw node pointer. |
| |
| Tree Internals |
| -------------- |
| |
| As described, the Zephyr rbtree implementation is a conventional |
| red/black tree as described pervasively in academic sources. Low |
| level details about the algorithm are out of scope for this document, |
| as they match existing conventions. This discussion will be limited |
| to details notable or specific to the Zephyr implementation. |
| |
| The core invariant guaranteed by the tree is that the path from the root of |
| the tree to any leaf is no more than twice as long as the path to any |
| other leaf. This is achieved by associating one bit of "color" with |
| each node, either red or black, and enforcing a rule that no red child |
| can be a child of another red child (i.e. that the number of black |
| nodes on any path to the root must be the same, and that no more than |
| that number of "extra" red nodes may be present). This rule is |
| enforced by a set of rotation rules used to "fix" trees following |
| modification. |
| |
| .. figure:: rbtree.png |
| :align: center |
| :alt: rbtree example |
| :figclass: align-center |
| |
| A maximally unbalanced rbtree with a black height of two. No more |
| nodes can be added underneath the rightmost node without |
| rebalancing. |
| |
| These rotations are conceptually implemented on top of a primitive |
| that "swaps" the position of one node with another in the list. |
| Typical implementations effect this by simply swapping the nodes |
| internal "data" pointers, but because the Zephyr ``struct rbnode`` is |
| intrusive, that cannot work. Zephyr must include somewhat more |
| elaborate code to handle the edge cases (for example, one swapped node |
| can be the root, or the two may already be parent/child). |
| |
| The ``struct rbnode`` struct for a Zephyr rbtree contains only two |
| pointers, representing the "left", and "right" children of a node |
| within the binary tree. Traversal of a tree for rebalancing following |
| modification, however, routinely requires the ability to iterate |
| "upwards" from a node as well. It is very common for red/black trees |
| in the industry to store a third "parent" pointer for this purpose. |
| Zephyr avoids this requirement by building a "stack" of node pointers |
| locally as it traverses downward thorugh the tree and updating it |
| appropriately as modifications are made. So a Zephyr rbtree can be |
| implemented with no more runtime storage overhead than a dlist. |
| |
| These properties, of a balanced tree data structure that works with |
| only two pointers of data per node and that works without any need for |
| a memory allocation API, are quite rare in the industry and are |
| somewhat unique to Zephyr. |
| |
| Ring Buffer |
| =========== |
| |
| For circumstances where an application needs to implement asynchronous |
| "streaming" copying of data, Zephyr provides a ``struct ring_buf`` |
| abstraction to manage copies of such data in and out of a shared |
| buffer of memory. Ring buffers may be used in either "bytes" mode, |
| where the data to be streamed is an uninterpreted array of bytes, or |
| "items" mode where the data much be an integral number of 32 bit |
| words. While the underlying data structure is the same, it is not |
| legal to mix these two modes on a single ring buffer instance. A ring |
| buffer initialized with a byte count must be used only with the |
| "bytes" API, one initialized with a word count must use the "items" |
| calls. |
| |
| A ``struct ring_buf`` may be placed anywhere in user-accessible |
| memory, and must be initialized with ``ring_buf_init()`` before use. |
| This must be provided a region of user-controlled memory for use as |
| the buffer itself. Note carefully that the units of the size of the |
| buffer passed change (either bytes or words) depending on how the ring |
| buffer will be used later. Macros for combining these steps in a |
| single static declaration exist for convenience. |
| ``RING_BUF_DECLARE()`` will declare and statically initialize a ring |
| buffer with a specified byte count, where |
| ``RING_BUF_ITEM_DECLARE_SIZE()`` will declare and statically |
| initialize a buffer with a given count of 32 bit words. |
| ``RING_BUF_ITEM_DECLARE_POW2()`` can be used to initialize an |
| items-mode buffer with a memory region guaranteed to be a power of |
| two, which enables various optimizations internal to the |
| implementation. No power-of-two initialization is available for |
| bytes-mode ring buffers. |
| |
| "Bytes" data may be copied into the ring buffer using |
| ``ring_buf_put()``, passing a data pointer and byte count. These |
| bytes will be copied into the buffer in order, as many as will fit in |
| the allocated buffer. The total number of bytes copied (which may be |
| fewer than provided) will be returned. Likewise ``ring_buf_get()`` |
| will copy bytes out of the ring buffer in the order that they were |
| written, into a user-provided buffer, returning the number of bytes |
| that were transferred. |
| |
| To avoid multiply-copied-data situations, a "claim" API exists for |
| byte mode. ``ring_buf_put_claim()`` takes a byte size value from the |
| user and returns a pointer to memory internal to the ring buffer that |
| can be used to receive those bytes, along with a size of the |
| contiguous internal region (which may be smaller than requested). The |
| user can then copy data into that region at a later time without |
| assembling all the bytes in a single region first. When complete, |
| ``ring_buf_put_finish()`` can be used to signal the buffer that the |
| transfer is complete, passing the number of bytes actually |
| transferred. At this point a new transfer can be initiated. |
| Similarly, ``ring_buf_get_claim()`` returns a pointer to internal ring |
| buffer data from which the user can read without making a verbatim |
| copy, and ``ring_buf_get_finish()`` signals the buffer with how many |
| bytes have been consumed and allows for a new transfer to begin. |
| |
| "Items" mode works similarly to bytes mode, except that all transfers |
| are in units of 32 bit words and all memory is assumed to be aligned |
| on 32 bit boundaries. The write and read operations are |
| ``ring_buf_item_put()`` and ``ring_buf_item_get()``, and work |
| otherwise identically to the bytes mode APIs. There no "claim" API |
| provided for items mode. One important difference is that unlike |
| ``ring_buf_put()``, ``ring_buf_item_put()`` will not do a partial |
| transfer; it will return an error in the case where the provided data |
| does not fit in its entirety. |
| |
| The user can manage the capacity of a ring buffer without modifying it |
| using the ``ring_buf_space_get()`` call (which returns a value of |
| either bytes or items depending on how the ring buffer has been used), |
| or by testing the ``ring_buf_is_empty()`` predicate. |
| |
| Finally, a ``ring_buf_reset()`` call exists to immediately empty a |
| ring buffer, discarding the tracking of any bytes or items already |
| written to the buffer. It does not modify the memory contents of the |
| buffer itself, however. |
| |
| Ring Buffer Internals |
| --------------------- |
| |
| Data streamed through a ring buffer is always written to the next byte |
| within the buffer, wrapping around to the first element after reaching |
| the end, thus the "ring" structure. Internally, the ``struct |
| ring_buf`` contains its own buffer pointer and its size, and also a |
| "head" and "tail" index representing where the next read and write |
| |
| This boundary is invisible to the user using the normal put/get APIs, |
| but becomes a barrier to the "claim" API, because obviously no |
| contiguous region can be returned that crosses the end of the buffer. |
| This can be surprising to application code, and produce performance |
| artifacts when transfers need to alias closely to the size of the |
| buffer, as the number of calls to claim/finish need to double for such |
| transfers. |
| |
| When running in items mode (only), the ring buffer contains two |
| implementations for the modular arithmetic required to compute "next |
| element" offsets. One is used for arbitrary sized buffers, but the |
| other is optimized for power of two sizes and can replace the compare |
| and subtract steps with a simple bitmask in several places, at the |
| cost of testing the "mask" value for each call. |