| .. _module-pw_enum: |
| |
| ======= |
| pw_enum |
| ======= |
| .. pigweed-module-subpage:: |
| :name: pw_enum |
| |
| ``pw_enum`` supports automatic stringifying and tokenizing of C++ enums. It |
| works by parsing C++ standard header files and generating versions of those |
| headers with minimal additions needed to support these features. |
| |
| Why use ``pw_enum``? |
| |
| * **Efficient string or tokenized logging**: Stringifies or :ref:`tokenizes |
| <module-pw_tokenizer>` logs automatically for seamless logging. |
| * **Automatic content-based versioning**: Generates version hashes to prevent |
| collisions as values change. |
| |
| ----------------------------------------- |
| Automatic tokenized and stringified enums |
| ----------------------------------------- |
| ``pw_enum`` works on enums declared in standard C++ header files. To use |
| ``pw_enum``: |
| |
| 1. Declare one or more enums in a header files. |
| 2. Include the header file in a ``pw_cc_enum`` target instead of a standard |
| ``cc_library``. |
| 3. Include :cs:`pw_enum/generate.h` in the header file. |
| 4. Register the enum using the :cc:`PW_ENUM(MyEnum, ...) <PW_ENUM>` macro at |
| global scope. List the fully qualified enum name, followed by all of its |
| enumerators. If an enumerator has multiple aliases, only include one of them. |
| |
| .. important:: |
| |
| The ``PW_ENUM`` macro **must be called at global scope** (outside of any |
| namespace blocks, class definitions, or functions). |
| |
| If ``PW_ENUM`` is called inside a namespace block, class, or function, the |
| C++ compiler will reject it with a compilation error indicating that the |
| template specialization of |
| ``_PW_ENUM_cannot_be_used_within_namespaces`` must occur at global scope. |
| |
| ``pw_enum`` headers are parsed during the build to support versioned |
| tokenization and stringification with :cc:`pw::EnumToString`. |
| |
| Example |
| ======= |
| Declare enums in a standard C++ header and call :cc:`PW_ENUM(MyEnum, ...) |
| <PW_ENUM>` at the bottom of the file, outside of any namespace blocks (in the |
| global namespace). |
| |
| .. literalinclude:: examples/private/enum_example/basic_enum.h |
| :language: cpp |
| :start-after: // the License. |
| |
| Use the enum normally. It is tokenized with :cc:`PW_TOKENIZE_ENUM` and works |
| with tokenized logs and :cc:`pw::EnumToString`. |
| |
| .. literalinclude:: examples/basic_enum.cc |
| :language: cpp |
| :start-after: // the License. |
| |
| Enums can reference values from other enums, even if they reside in different |
| files and namespaces. |
| |
| .. literalinclude:: examples/private/enum_example/other_enum.h |
| :language: cpp |
| :start-after: // the License. |
| |
| .. literalinclude:: examples/enum_example/references_other_enum.h |
| :language: cpp |
| :start-after: // the License. |
| |
| .. literalinclude:: examples/BUILD.bazel |
| :language: python |
| :start-after: [pw_enum-examples-advanced-bazel] |
| :end-before: [pw_enum-examples-advanced-bazel] |
| |
| Enumerator names |
| ================ |
| By default, enumerator names that follow Google's ``kEnumName`` style are |
| converted to upper snake case, without the ``k`` prefix (``ENUM_NAME``). Names |
| that do not follow Google style are used directly. |
| |
| To override the default enumerator name, specify it in the :cc:`PW_ENUM(name, |
| ...) <PW_ENUM>` macro with a string literal after ``=``. For example: |
| |
| .. code-block:: cpp |
| |
| PW_ENUM(my::Enum, // String name: |
| kStandardStyle, // "STANDARD_STYLE" |
| kCustom = "custom_name", // "custom_name" |
| nonStandard, // "nonStandard" |
| ); |
| |
| Enumerator aliases |
| ================== |
| If multiple enumerator names share the same value (aliases), they can be |
| registered together in the ``PW_ENUM`` macro. The generator groups registered |
| aliases, sorting their display names alphabetically and joining them with ``|`` |
| (e.g. ``"ALPHA|ALIAS_ALPHA"``). To omit aliases, simply leave them out of |
| ``PW_ENUM``. |
| |
| Logging enums |
| ============= |
| Enums generated by ``pw_enum`` natively support Pigweed's tokenized logging |
| infrastructure. |
| |
| * **Versioned format macro**: ``pw_enum`` generates a macro to use in the format |
| string for the enum. The macro is named for the namespace and enum name |
| (e.g. ``MY_NESTED_PKG_MY_ENUM``). The macro evaluates to a string literal |
| that can be concatenated into a format string. |
| |
| The macro is versioned based on the enum's contents. The version changes |
| automatically when the enum changes, so tokenized logs of enums never have |
| collisions. |
| |
| A ``*_DOMAIN`` macro (e.g. ``MY_NESTED_PKG_MY_ENUM_DOMAIN``) is also generated |
| with the enum's tokenization domain, for use with :ref:`nested tokenization |
| <module-pw_tokenizer-nested-arguments>`. |
| |
| * **Argument macro**: Include :cs:`pw_log/tokenized_args.h` and use |
| :cc:`PW_LOG_ENUM(value) <PW_LOG_ENUM>` as the argument to the log |
| statement. |
| |
| When using a tokenizing logging backend, the generated format macro evaluates to |
| ``PW_TOKEN_FMT(::namespace::Enum)``, and :cc:`PW_LOG_ENUM` resolves to |
| :cc:`pw::tokenizer::EnumToToken`, logging the 32-bit token. When using a |
| standard string-based logging backend, the format macro yields the string format |
| specifier ``%s``, and :cc:`PW_LOG_ENUM` resolves to :cc:`pw::EnumToString`, |
| which yields the string representation. |
| |
| Example |
| ------- |
| .. literalinclude:: examples/basic_enum_test.cc |
| :language: cpp |
| :start-after: [pw_enum-examples-basic-cc-log] |
| :end-before: [pw_enum-examples-basic-cc-log] |
| |
| Build integration |
| ================= |
| ``pw_enum`` provides build integration for Bazel, GN, and CMake. |
| |
| .. tab-set:: |
| |
| .. tab-item:: Bazel |
| |
| Use the ``pw_cc_enum`` rule from :cs:`//pw_enum:pw_cc_enum.bzl`. |
| |
| .. literalinclude:: BUILD.bazel |
| :language: python |
| :start-after: [pw_enum-basic-bazel] |
| :end-before: [pw_enum-basic-bazel] |
| |
| .. tab-item:: GN |
| |
| Use the ``pw_cc_enum`` template from :cs:`"$dir_pw_enum/pw_cc_enum.gni" |
| <//pw_enum:pw_cc_enum.gni>`. |
| |
| .. literalinclude:: BUILD.gn |
| :start-after: [pw_enum-basic-gn] |
| :end-before: [pw_enum-basic-gn] |
| |
| .. tab-item:: CMake |
| |
| Use the ``pw_cc_enum`` function from :cs:`pw_enum/pw_cc_enum.cmake |
| <//pw_enum:pw_cc_enum.cmake>`. |
| |
| .. literalinclude:: CMakeLists.txt |
| :language: cmake |
| :start-after: [pw_enum-basic-cmake] |
| :end-before: [pw_enum-basic-cmake] |
| |
| ------------------ |
| Stringifying enums |
| ------------------ |
| :cc:`pw::EnumToString` returns a string version of an enum. It uses a FTADLE |
| extension point ``PwEnumToString(enum)``. FTADLE is a pattern that enables |
| customization by searching for a matching function via Argument-Dependent Lookup |
| (ADL). For more information, see `Designing Extension Points With FTADLE |
| <https://abseil.io/tips/218>`_. |
| |
| If you don't use ``pw_cc_enum``, you can manually use :cc:`PW_TOKENIZE_ENUM` |
| in :cs:`pw_tokenizer/enum.h <pw_tokenizer/public/pw_tokenizer/enum.h>` to |
| tokenize the enum and implement ``PwEnumToString``. |
| |
| ---------------------- |
| Cross language support |
| ---------------------- |
| ``pw_enum`` is currently C++-only, but could be expanded to support other |
| languages. The parser extracts the full enum definition and resolves all |
| enumerator values, so it would be straightforward to generate compatible enum |
| definitions for other languages from a C++ header. ``pw_enum`` could also |
| support an alternate format, such as JSON, for the original enum definition, and |
| generate C++ and other languages from that. |
| |
| ---------- |
| Background |
| ---------- |
| :ref:`module-pw_tokenizer` is one of Pigweed's most widely adopted features. It |
| has supported nested tokenization---a tokenized message inside another tokenized |
| message---since the early days. Initially, only Base64-encoded messages were |
| supported, which is inefficient. Support for directly encoding nested messages |
| as 32-bit integers was added later (see :ref:`seed-0105`). |
| |
| With support for encoding tokens as integers, supporting rich enums was a clear |
| next step. This culminated in the creation of :cs:`pw_tokenizer/enum.h |
| <pw_tokenizer/public/pw_tokenizer/enum.h>` and its supporting macros. This |
| approach uses the enum's integral value as a nested token, discriminated by its |
| namespace to avoid collisions between different enum types. The result is |
| highly efficient enum logs that are still readable and user-friendly. |
| |
| The need for versioning |
| ======================= |
| Real-world deployment of :cs:`pw_tokenizer/enum.h |
| <pw_tokenizer/public/pw_tokenizer/enum.h>` soon revealed a critical flaw. When |
| enum values were changed or reordered during development, the resulting tokens |
| changed. When merging token databases from different builds, this led to |
| collisions, where the same token mapped to different string representations. It |
| became clear that enum tokenization required versioning. |
| |
| Several alternatives were explored for automatic enum versioning. A key |
| constraint was that enum values must be able to be set with expressions, which |
| may reference constants or other enum values. Approaches considered included: |
| |
| * **Tokenize names instead of values**: Hash the enumerator names and generate |
| a function with a switch statement to map values to tokens at runtime. |
| * **Version in the domain**: Incorporate a hash of the enum's contents (names |
| and values) into the tokenization domain, requiring two arguments to log an |
| enum (the version and the value). |
| * **Calculate tokens from a base**: This approach used a hash of the enum's |
| contents as a base offset, adding the enum value to it at the call site. |
| |
| Ultimately, these approaches were ruled out because they increased code size |
| relative to the existing implementation, primarily due to the additional code |
| required at the call site. |
| |
| The code size penalty could be avoided if there were a ``constexpr`` way to |
| insert the enum's version into the log format string. Then, the existing token |
| logging macros could be used (``PW_LOG_FMT``). For example: |
| |
| .. code-block:: cpp |
| |
| // If there were a way to define this macro during compilation, versioned |
| // enums would have no code size cost relative to unversioned enums. |
| #define MY_ENUM_FMT PW_LOG_FMT("::my::Enum::version_1234") |
| |
| PW_LOG_INFO("My enum: " MY_ENUM_FMT, PW_LOG_ENUM(my_enum)) |
| |
| Unfortunately, there is no way get the versioned enum domain into a |
| concatenatable string literal. This is required for compatibility with |
| ``pw_log``'s C-style API. If ``pw_log`` offered a C++-only API, this would be |
| feasible, but adding such an API was out of scope. |
| |
| Generating enums |
| ================ |
| Generating enums appeared to be the only way to get the enum's version into a |
| string literal at compile time. This led to the creation of ``pw_enum``. |
| |
| JSON definition |
| --------------- |
| The initial implementation of ``pw_enum`` generated C++ headers from JSON files. |
| While this worked well technically, it proved too difficult for projects to |
| adopt due to the friction of maintaining JSON definitions for standard C++ |
| enums. Protocol Buffers were considered in place of JSON, but they are too |
| limited for this use case. Protobufs do not support setting enumerator values |
| based on other enums or external constants. |
| |
| Parse C++ source |
| ---------------- |
| Finally, multiple approaches for parsing enum definitions out of C++ source code |
| were explored. These included: |
| |
| - Use `libclang <https://clang.llvm.org/docs/LibClang.html>`_ from Python to |
| parse header files. This would be robust and even perform ``constexpr`` |
| evaluation of enumerator values. Unfortunately, ``libclang`` is a large |
| dependency, and is not readily available on all platforms. |
| - Parse ``clang``'s `-ast-dump <https://clang.llvm.org/docs/IntroductionToTheClangAST.html#examining-the-ast>`_ |
| output. This would be fairly robust, but would involve parsing moderately |
| complex, non-standard text output intended for human consumption. It also |
| requires ``clang``, which not all projects build with. |
| - Use a custom Python parser. This approach would be utterly impractical and |
| brittle. |
| |
| Ultimately, parsing C++ source directly proved infeasible. The final design |
| avoids parsing arbitrary C++ source with the :cc:`PW_ENUM(name, ...) <PW_ENUM>` |
| macro. |
| |
| ------ |
| Design |
| ------ |
| |
| The final design of ``pw_enum`` addresses the constraints identified during its |
| evolution by combining standard C++ header files with a specialized build-time |
| generator powered by compile-time template evaluation. |
| |
| This architecture provides a seamless user experience with zero runtime overhead |
| and robust protection against database collisions, while maintaining the full |
| expressiveness of standard C++ enum definitions. |
| |
| Source files |
| ============ |
| Users define enums in standard C++ header files. To opt-in to ``pw_enum``, the |
| header includes :cs:`pw_enum/generate.h` and registers each enum by calling the |
| :cc:`PW_ENUM(...) <PW_ENUM>` macro with the enum name and names of all of its |
| enumerators. |
| |
| The macro's primary purpose is to capture enum metadata in an easily parsable |
| format. ``PW_ENUM()`` expands to the enum name and list of enumerators, |
| surrounded by unique markers. A :cs:`Python script |
| <pw_enum/py/pw_enum/parse.py>` searches a preprocessed source file for the |
| markers and extracts the enum metadata. |
| |
| The macro also serves to require that users list the file in a ``pw_cc_enum`` |
| target (see `Build system integration`_), which is necessary for it to be |
| processed. If the file is not processed by ``pw_enum`` machinery, the macro |
| expands to ``static_assert(false)``, causing the build to fail with an |
| informative error. |
| |
| Enumerator evaluation |
| ===================== |
| Enumerator values can be defined by arbitrary C++ expressions. The values may change, |
| even if the individual source file does not change. |
| |
| The :cs:`parse.py <pw_enum/py/pw_enum/parse.py>` script evaluates enumerators |
| generating a source file that references them. The source file instantiates a |
| template with the enumerators as template arguments. Compilation fails, but |
| includes the enumerator values in an easily parsable form. |
| |
| This solution is far from ideal, but has proven to be robust. It evaluates |
| enumerators with the same toolchain as the rest of the project. Printing |
| compile-time constants with failed template instantiations is a common |
| workaround to achieve compile-time "printf" functionality. The script searches |
| for a unique template name and doesn't depend on a particular compiler or |
| version. |
| |
| Build-time generation |
| ===================== |
| After parsing, the ``pw_cc_enum`` target runs a Python script |
| (:cs:`pw_enum/py/pw_enum/generate.py`) that generates a header. |
| |
| 1. **Enum generation**: The script generates a "shadowed" version of the header |
| in the build directory. This generated header contains the original content, |
| plus a footer with tokenization metadata. It also replaces the |
| ``PW_ENUM(...)`` calls with ``_PW_ENUM_GENERATED(...)``. ``PW_ENUM(...)`` |
| expands to ``static_assert(false)`` to require users to build headers with |
| ``pw_cc_enum``. |
| 2. **Versioning**: A unique version hash is calculated for each enum based on |
| its fully qualified name and the names and values of all its enumerators. |
| This hash is used to construct a unique tokenization domain (e.g., |
| ``::namespace::_pw_enum_HASH::EnumName``). This ensures that if the enum |
| changes, the domain changes, preventing collisions in merged token databases. |
| 3. **Tokenization**: The generated footer includes a call to |
| :cc:`PW_TOKENIZE_ENUM_CUSTOM` from :ref:`module-pw_tokenizer`, which |
| registers the enum values and their string representations in the database. |
| |
| Build system integration |
| ======================== |
| The ``pw_cc_enum`` build rule automates the process of parsing C++ headers and |
| generating versioned enum metadata. It invokes the ``pw_enum`` generator with |
| the correct compilation flags and ensures the generated headers are prioritized |
| during compilation. |
| |
| * **Bazel** (:cs:`pw_cc_enum.bzl <//pw_enum:pw_cc_enum.bzl>`) creates an |
| internal library target to collect the compilation flags (includes, defines) |
| required to parse the header correctly and passes them to the Python script. |
| * **CMake** (:cs:`pw_cc_enum.cmake <//pw_enum:pw_cc_enum.cmake>`) creates an |
| internal interface library to collect includes and defines from dependencies, |
| and uses ``file(GENERATE)`` to produce a flags file for the generator. It |
| uses ``-iquote`` to ensure that the build system prioritizes the generated |
| shadowed header over the original source header. |
| * **GN** (:cs:`pw_cc_enum.gni <//pw_enum:pw_cc_enum.gni>`) compiles a |
| placeholder C++ file with the enum's dependencies to generate a target Ninja |
| file. The generator script parses the target Ninja file and the toolchain's |
| Ninja file to extract the compiler and its compilation flags (defines, |
| includes, and flags) to run the evaluation step. This is similar to how |
| :ref:`module-pw_compilation_testing` works. Like CMake, the GN build uses |
| ``-iquote`` to ensure that the build system prioritizes the generated shadowed |
| header over the original source header. |