blob: 54ddc8ecda57b6357b463e21440d963396aa8b8d [file] [edit]
.. _module-pw_enum:
=======
pw_enum
=======
.. pigweed-module-subpage::
:name: pw_enum
``pw_enum`` supports automatic stringifying and tokenizing of C++ enums. It
works by parsing C++ standard header files and generating versions of those
headers with minimal additions needed to support these features.
Why use ``pw_enum``?
* **Efficient string or tokenized logging**: Stringifies or :ref:`tokenizes
<module-pw_tokenizer>` logs automatically for seamless logging.
* **Automatic content-based versioning**: Generates version hashes to prevent
collisions as values change.
-----------------------------------------
Automatic tokenized and stringified enums
-----------------------------------------
``pw_enum`` works on enums declared in standard C++ header files. To use
``pw_enum``:
1. Declare one or more enums in a header files.
2. Include the header file in a ``pw_cc_enum`` target instead of a standard
``cc_library``.
3. Include :cs:`pw_enum/generate.h` in the header file.
4. Register the enum using the :cc:`PW_ENUM(MyEnum, ...) <PW_ENUM>` macro at
global scope. List the fully qualified enum name, followed by all of its
enumerators. If an enumerator has multiple aliases, only include one of them.
.. important::
The ``PW_ENUM`` macro **must be called at global scope** (outside of any
namespace blocks, class definitions, or functions).
If ``PW_ENUM`` is called inside a namespace block, class, or function, the
C++ compiler will reject it with a compilation error indicating that the
template specialization of
``_PW_ENUM_cannot_be_used_within_namespaces`` must occur at global scope.
``pw_enum`` headers are parsed during the build to support versioned
tokenization and stringification with :cc:`pw::EnumToString`.
Example
=======
Declare enums in a standard C++ header and call :cc:`PW_ENUM(MyEnum, ...)
<PW_ENUM>` at the bottom of the file, outside of any namespace blocks (in the
global namespace).
.. literalinclude:: examples/private/enum_example/basic_enum.h
:language: cpp
:start-after: // the License.
Use the enum normally. It is tokenized with :cc:`PW_TOKENIZE_ENUM` and works
with tokenized logs and :cc:`pw::EnumToString`.
.. literalinclude:: examples/basic_enum.cc
:language: cpp
:start-after: // the License.
Enums can reference values from other enums, even if they reside in different
files and namespaces.
.. literalinclude:: examples/private/enum_example/other_enum.h
:language: cpp
:start-after: // the License.
.. literalinclude:: examples/enum_example/references_other_enum.h
:language: cpp
:start-after: // the License.
.. literalinclude:: examples/BUILD.bazel
:language: python
:start-after: [pw_enum-examples-advanced-bazel]
:end-before: [pw_enum-examples-advanced-bazel]
Enumerator names
================
By default, enumerator names that follow Google's ``kEnumName`` style are
converted to upper snake case, without the ``k`` prefix (``ENUM_NAME``). Names
that do not follow Google style are used directly.
To override the default enumerator name, specify it in the :cc:`PW_ENUM(name,
...) <PW_ENUM>` macro with a string literal after ``=``. For example:
.. code-block:: cpp
PW_ENUM(my::Enum, // String name:
kStandardStyle, // "STANDARD_STYLE"
kCustom = "custom_name", // "custom_name"
nonStandard, // "nonStandard"
);
Enumerator aliases
==================
If multiple enumerator names share the same value (aliases), they can be
registered together in the ``PW_ENUM`` macro. The generator groups registered
aliases, sorting their display names alphabetically and joining them with ``|``
(e.g. ``"ALPHA|ALIAS_ALPHA"``). To omit aliases, simply leave them out of
``PW_ENUM``.
Logging enums
=============
Enums generated by ``pw_enum`` natively support Pigweed's tokenized logging
infrastructure.
* **Versioned format macro**: ``pw_enum`` generates a macro to use in the format
string for the enum. The macro is named for the namespace and enum name
(e.g. ``MY_NESTED_PKG_MY_ENUM``). The macro evaluates to a string literal
that can be concatenated into a format string.
The macro is versioned based on the enum's contents. The version changes
automatically when the enum changes, so tokenized logs of enums never have
collisions.
A ``*_DOMAIN`` macro (e.g. ``MY_NESTED_PKG_MY_ENUM_DOMAIN``) is also generated
with the enum's tokenization domain, for use with :ref:`nested tokenization
<module-pw_tokenizer-nested-arguments>`.
* **Argument macro**: Include :cs:`pw_log/tokenized_args.h` and use
:cc:`PW_LOG_ENUM(value) <PW_LOG_ENUM>` as the argument to the log
statement.
When using a tokenizing logging backend, the generated format macro evaluates to
``PW_TOKEN_FMT(::namespace::Enum)``, and :cc:`PW_LOG_ENUM` resolves to
:cc:`pw::tokenizer::EnumToToken`, logging the 32-bit token. When using a
standard string-based logging backend, the format macro yields the string format
specifier ``%s``, and :cc:`PW_LOG_ENUM` resolves to :cc:`pw::EnumToString`,
which yields the string representation.
Example
-------
.. literalinclude:: examples/basic_enum_test.cc
:language: cpp
:start-after: [pw_enum-examples-basic-cc-log]
:end-before: [pw_enum-examples-basic-cc-log]
Build integration
=================
``pw_enum`` provides build integration for Bazel, GN, and CMake.
.. tab-set::
.. tab-item:: Bazel
Use the ``pw_cc_enum`` rule from :cs:`//pw_enum:pw_cc_enum.bzl`.
.. literalinclude:: BUILD.bazel
:language: python
:start-after: [pw_enum-basic-bazel]
:end-before: [pw_enum-basic-bazel]
.. tab-item:: GN
Use the ``pw_cc_enum`` template from :cs:`"$dir_pw_enum/pw_cc_enum.gni"
<//pw_enum:pw_cc_enum.gni>`.
.. literalinclude:: BUILD.gn
:start-after: [pw_enum-basic-gn]
:end-before: [pw_enum-basic-gn]
.. tab-item:: CMake
Use the ``pw_cc_enum`` function from :cs:`pw_enum/pw_cc_enum.cmake
<//pw_enum:pw_cc_enum.cmake>`.
.. literalinclude:: CMakeLists.txt
:language: cmake
:start-after: [pw_enum-basic-cmake]
:end-before: [pw_enum-basic-cmake]
------------------
Stringifying enums
------------------
:cc:`pw::EnumToString` returns a string version of an enum. It uses a FTADLE
extension point ``PwEnumToString(enum)``. FTADLE is a pattern that enables
customization by searching for a matching function via Argument-Dependent Lookup
(ADL). For more information, see `Designing Extension Points With FTADLE
<https://abseil.io/tips/218>`_.
If you don't use ``pw_cc_enum``, you can manually use :cc:`PW_TOKENIZE_ENUM`
in :cs:`pw_tokenizer/enum.h <pw_tokenizer/public/pw_tokenizer/enum.h>` to
tokenize the enum and implement ``PwEnumToString``.
----------------------
Cross language support
----------------------
``pw_enum`` is currently C++-only, but could be expanded to support other
languages. The parser extracts the full enum definition and resolves all
enumerator values, so it would be straightforward to generate compatible enum
definitions for other languages from a C++ header. ``pw_enum`` could also
support an alternate format, such as JSON, for the original enum definition, and
generate C++ and other languages from that.
----------
Background
----------
:ref:`module-pw_tokenizer` is one of Pigweed's most widely adopted features. It
has supported nested tokenization---a tokenized message inside another tokenized
message---since the early days. Initially, only Base64-encoded messages were
supported, which is inefficient. Support for directly encoding nested messages
as 32-bit integers was added later (see :ref:`seed-0105`).
With support for encoding tokens as integers, supporting rich enums was a clear
next step. This culminated in the creation of :cs:`pw_tokenizer/enum.h
<pw_tokenizer/public/pw_tokenizer/enum.h>` and its supporting macros. This
approach uses the enum's integral value as a nested token, discriminated by its
namespace to avoid collisions between different enum types. The result is
highly efficient enum logs that are still readable and user-friendly.
The need for versioning
=======================
Real-world deployment of :cs:`pw_tokenizer/enum.h
<pw_tokenizer/public/pw_tokenizer/enum.h>` soon revealed a critical flaw. When
enum values were changed or reordered during development, the resulting tokens
changed. When merging token databases from different builds, this led to
collisions, where the same token mapped to different string representations. It
became clear that enum tokenization required versioning.
Several alternatives were explored for automatic enum versioning. A key
constraint was that enum values must be able to be set with expressions, which
may reference constants or other enum values. Approaches considered included:
* **Tokenize names instead of values**: Hash the enumerator names and generate
a function with a switch statement to map values to tokens at runtime.
* **Version in the domain**: Incorporate a hash of the enum's contents (names
and values) into the tokenization domain, requiring two arguments to log an
enum (the version and the value).
* **Calculate tokens from a base**: This approach used a hash of the enum's
contents as a base offset, adding the enum value to it at the call site.
Ultimately, these approaches were ruled out because they increased code size
relative to the existing implementation, primarily due to the additional code
required at the call site.
The code size penalty could be avoided if there were a ``constexpr`` way to
insert the enum's version into the log format string. Then, the existing token
logging macros could be used (``PW_LOG_FMT``). For example:
.. code-block:: cpp
// If there were a way to define this macro during compilation, versioned
// enums would have no code size cost relative to unversioned enums.
#define MY_ENUM_FMT PW_LOG_FMT("::my::Enum::version_1234")
PW_LOG_INFO("My enum: " MY_ENUM_FMT, PW_LOG_ENUM(my_enum))
Unfortunately, there is no way get the versioned enum domain into a
concatenatable string literal. This is required for compatibility with
``pw_log``'s C-style API. If ``pw_log`` offered a C++-only API, this would be
feasible, but adding such an API was out of scope.
Generating enums
================
Generating enums appeared to be the only way to get the enum's version into a
string literal at compile time. This led to the creation of ``pw_enum``.
JSON definition
---------------
The initial implementation of ``pw_enum`` generated C++ headers from JSON files.
While this worked well technically, it proved too difficult for projects to
adopt due to the friction of maintaining JSON definitions for standard C++
enums. Protocol Buffers were considered in place of JSON, but they are too
limited for this use case. Protobufs do not support setting enumerator values
based on other enums or external constants.
Parse C++ source
----------------
Finally, multiple approaches for parsing enum definitions out of C++ source code
were explored. These included:
- Use `libclang <https://clang.llvm.org/docs/LibClang.html>`_ from Python to
parse header files. This would be robust and even perform ``constexpr``
evaluation of enumerator values. Unfortunately, ``libclang`` is a large
dependency, and is not readily available on all platforms.
- Parse ``clang``'s `-ast-dump <https://clang.llvm.org/docs/IntroductionToTheClangAST.html#examining-the-ast>`_
output. This would be fairly robust, but would involve parsing moderately
complex, non-standard text output intended for human consumption. It also
requires ``clang``, which not all projects build with.
- Use a custom Python parser. This approach would be utterly impractical and
brittle.
Ultimately, parsing C++ source directly proved infeasible. The final design
avoids parsing arbitrary C++ source with the :cc:`PW_ENUM(name, ...) <PW_ENUM>`
macro.
------
Design
------
The final design of ``pw_enum`` addresses the constraints identified during its
evolution by combining standard C++ header files with a specialized build-time
generator powered by compile-time template evaluation.
This architecture provides a seamless user experience with zero runtime overhead
and robust protection against database collisions, while maintaining the full
expressiveness of standard C++ enum definitions.
Source files
============
Users define enums in standard C++ header files. To opt-in to ``pw_enum``, the
header includes :cs:`pw_enum/generate.h` and registers each enum by calling the
:cc:`PW_ENUM(...) <PW_ENUM>` macro with the enum name and names of all of its
enumerators.
The macro's primary purpose is to capture enum metadata in an easily parsable
format. ``PW_ENUM()`` expands to the enum name and list of enumerators,
surrounded by unique markers. A :cs:`Python script
<pw_enum/py/pw_enum/parse.py>` searches a preprocessed source file for the
markers and extracts the enum metadata.
The macro also serves to require that users list the file in a ``pw_cc_enum``
target (see `Build system integration`_), which is necessary for it to be
processed. If the file is not processed by ``pw_enum`` machinery, the macro
expands to ``static_assert(false)``, causing the build to fail with an
informative error.
Enumerator evaluation
=====================
Enumerator values can be defined by arbitrary C++ expressions. The values may change,
even if the individual source file does not change.
The :cs:`parse.py <pw_enum/py/pw_enum/parse.py>` script evaluates enumerators
generating a source file that references them. The source file instantiates a
template with the enumerators as template arguments. Compilation fails, but
includes the enumerator values in an easily parsable form.
This solution is far from ideal, but has proven to be robust. It evaluates
enumerators with the same toolchain as the rest of the project. Printing
compile-time constants with failed template instantiations is a common
workaround to achieve compile-time "printf" functionality. The script searches
for a unique template name and doesn't depend on a particular compiler or
version.
Build-time generation
=====================
After parsing, the ``pw_cc_enum`` target runs a Python script
(:cs:`pw_enum/py/pw_enum/generate.py`) that generates a header.
1. **Enum generation**: The script generates a "shadowed" version of the header
in the build directory. This generated header contains the original content,
plus a footer with tokenization metadata. It also replaces the
``PW_ENUM(...)`` calls with ``_PW_ENUM_GENERATED(...)``. ``PW_ENUM(...)``
expands to ``static_assert(false)`` to require users to build headers with
``pw_cc_enum``.
2. **Versioning**: A unique version hash is calculated for each enum based on
its fully qualified name and the names and values of all its enumerators.
This hash is used to construct a unique tokenization domain (e.g.,
``::namespace::_pw_enum_HASH::EnumName``). This ensures that if the enum
changes, the domain changes, preventing collisions in merged token databases.
3. **Tokenization**: The generated footer includes a call to
:cc:`PW_TOKENIZE_ENUM_CUSTOM` from :ref:`module-pw_tokenizer`, which
registers the enum values and their string representations in the database.
Build system integration
========================
The ``pw_cc_enum`` build rule automates the process of parsing C++ headers and
generating versioned enum metadata. It invokes the ``pw_enum`` generator with
the correct compilation flags and ensures the generated headers are prioritized
during compilation.
* **Bazel** (:cs:`pw_cc_enum.bzl <//pw_enum:pw_cc_enum.bzl>`) creates an
internal library target to collect the compilation flags (includes, defines)
required to parse the header correctly and passes them to the Python script.
* **CMake** (:cs:`pw_cc_enum.cmake <//pw_enum:pw_cc_enum.cmake>`) creates an
internal interface library to collect includes and defines from dependencies,
and uses ``file(GENERATE)`` to produce a flags file for the generator. It
uses ``-iquote`` to ensure that the build system prioritizes the generated
shadowed header over the original source header.
* **GN** (:cs:`pw_cc_enum.gni <//pw_enum:pw_cc_enum.gni>`) compiles a
placeholder C++ file with the enum's dependencies to generate a target Ninja
file. The generator script parses the target Ninja file and the toolchain's
Ninja file to extract the compiler and its compilation flags (defines,
includes, and flags) to run the evaluation step. This is similar to how
:ref:`module-pw_compilation_testing` works. Like CMake, the GN build uses
``-iquote`` to ensure that the build system prioritizes the generated shadowed
header over the original source header.