pw_tokenizer: Move the API reference content Change-Id: I5b1bf3ddd65ad8d97ffa9da2f6130b3ba160ad04 Reviewed-on: https://pigweed-review.googlesource.com/c/pigweed/pigweed/+/152073 Presubmit-Verified: CQ Bot Account <pigweed-scoped@luci-project-accounts.iam.gserviceaccount.com> Commit-Queue: Auto-Submit <auto-submit@pigweed.google.com.iam.gserviceaccount.com> Reviewed-by: Kayce Basques <kayce@google.com> Pigweed-Auto-Submit: Kayce Basques <kayce@google.com> Reviewed-by: Chad Norvell <chadnorvell@google.com>

commit: 2b5581f419e8cff765ee12a0952be3f1f55c6f0a [log] [tgz]
author: Kayce Basques <kayce@google.com> Fri Jun 16 17:54:41 2023 +0000
committer: CQ Bot Account <pigweed-scoped@luci-project-accounts.iam.gserviceaccount.com> Fri Jun 16 17:54:41 2023 +0000
tree: 2cb345dc71f55c93105637e2ab523709a82bafea
parent: 4e943da80245a5731c4ffab6cc90fd9af8fe6f67 [diff]
diff --git a/pw_tokenizer/BUILD.gn b/pw_tokenizer/BUILD.gn
index d12d12a..62ad0f1 100644
--- a/pw_tokenizer/BUILD.gn
+++ b/pw_tokenizer/BUILD.gn

@@ -314,6 +314,7 @@
 
 pw_doc_group("docs") {
   sources = [
+    "api.rst",
     "design.rst",
     "docs.rst",
     "proto.rst",

diff --git a/pw_tokenizer/api.rst b/pw_tokenizer/api.rst
new file mode 100644
index 0000000..3b81302
--- /dev/null
+++ b/pw_tokenizer/api.rst

@@ -0,0 +1,235 @@
+.. _module-pw_tokenizer-api:
+
+=============
+API reference
+=============
+.. pigweed-module-subpage::
+   :name: pw_tokenizer
+   :tagline: Cut your log sizes in half
+   :nav:
+      getting started: module-pw_tokenizer-get-started
+      design: module-pw_tokenizer-design
+      api: module-pw_tokenizer-api
+
+.. _module-pw_tokenizer-api-tokenization:
+
+------------
+Tokenization
+------------
+Tokenization converts a string literal to a token. If it's a printf-style
+string, its arguments are encoded along with it. The results of tokenization can
+be sent off device or stored in place of a full string.
+
+.. doxygentypedef:: pw_tokenizer_Token
+
+Tokenization macros
+===================
+Adding tokenization to a project is simple. To tokenize a string, include
+``pw_tokenizer/tokenize.h`` and invoke one of the ``PW_TOKENIZE_`` macros.
+
+Tokenize a string literal
+-------------------------
+``pw_tokenizer`` provides macros for tokenizing string literals with no
+arguments.
+
+.. doxygendefine:: PW_TOKENIZE_STRING
+.. doxygendefine:: PW_TOKENIZE_STRING_DOMAIN
+.. doxygendefine:: PW_TOKENIZE_STRING_MASK
+
+The tokenization macros above cannot be used inside other expressions.
+
+.. admonition:: **Yes**: Assign :c:macro:`PW_TOKENIZE_STRING` to a ``constexpr`` variable.
+  :class: checkmark
+
+  .. code:: cpp
+
+    constexpr uint32_t kGlobalToken = PW_TOKENIZE_STRING("Wowee Zowee!");
+
+    void Function() {
+      constexpr uint32_t local_token = PW_TOKENIZE_STRING("Wowee Zowee?");
+    }
+
+.. admonition:: **No**: Use :c:macro:`PW_TOKENIZE_STRING` in another expression.
+  :class: error
+
+  .. code:: cpp
+
+   void BadExample() {
+     ProcessToken(PW_TOKENIZE_STRING("This won't compile!"));
+   }
+
+  Use :c:macro:`PW_TOKENIZE_STRING_EXPR` instead.
+
+An alternate set of macros are provided for use inside expressions. These make
+use of lambda functions, so while they can be used inside expressions, they
+require C++ and cannot be assigned to constexpr variables or be used with
+special function variables like ``__func__``.
+
+.. doxygendefine:: PW_TOKENIZE_STRING_EXPR
+.. doxygendefine:: PW_TOKENIZE_STRING_DOMAIN_EXPR
+.. doxygendefine:: PW_TOKENIZE_STRING_MASK_EXPR
+
+.. admonition:: When to use these macros
+
+  Use :c:macro:`PW_TOKENIZE_STRING` and related macros to tokenize string
+  literals that do not need %-style arguments encoded.
+
+.. admonition:: **Yes**: Use :c:macro:`PW_TOKENIZE_STRING_EXPR` within other expressions.
+  :class: checkmark
+
+  .. code:: cpp
+
+    void GoodExample() {
+      ProcessToken(PW_TOKENIZE_STRING_EXPR("This will compile!"));
+    }
+
+.. admonition:: **No**: Assign :c:macro:`PW_TOKENIZE_STRING_EXPR` to a ``constexpr`` variable.
+  :class: error
+
+  .. code:: cpp
+
+     constexpr uint32_t wont_work = PW_TOKENIZE_STRING_EXPR("This won't compile!"));
+
+  Instead, use :c:macro:`PW_TOKENIZE_STRING` to assign to a ``constexpr`` variable.
+
+.. admonition:: **No**: Tokenize ``__func__`` in :c:macro:`PW_TOKENIZE_STRING_EXPR`.
+  :class: error
+
+  .. code:: cpp
+
+    void BadExample() {
+      // This compiles, but __func__ will not be the outer function's name, and
+      // there may be compiler warnings.
+      constexpr uint32_t wont_work = PW_TOKENIZE_STRING_EXPR(__func__);
+    }
+
+  Instead, use :c:macro:`PW_TOKENIZE_STRING` to tokenize ``__func__`` or similar macros.
+
+Tokenize a message with arguments to a buffer
+---------------------------------------------
+.. doxygendefine:: PW_TOKENIZE_TO_BUFFER
+.. doxygendefine:: PW_TOKENIZE_TO_BUFFER_DOMAIN
+.. doxygendefine:: PW_TOKENIZE_TO_BUFFER_MASK
+
+.. admonition:: Why use this macro
+
+   - Encode a tokenized message for consumption within a function.
+   - Encode a tokenized message into an existing buffer.
+
+   Avoid using ``PW_TOKENIZE_TO_BUFFER`` in widely expanded macros, such as a
+   logging macro, because it will result in larger code size than passing the
+   tokenized data to a function.
+
+.. _module-pw_tokenizer-custom-macro:
+
+Tokenize a message with arguments in a custom macro
+---------------------------------------------------
+Projects can leverage the tokenization machinery in whichever way best suits
+their needs. The most efficient way to use ``pw_tokenizer`` is to pass tokenized
+data to a global handler function. A project's custom tokenization macro can
+handle tokenized data in a function of their choosing.
+
+``pw_tokenizer`` provides two low-level macros for projects to use
+to create custom tokenization macros.
+
+.. doxygendefine:: PW_TOKENIZE_FORMAT_STRING
+.. doxygendefine:: PW_TOKENIZER_ARG_TYPES
+
+The outputs of these macros are typically passed to an encoding function. That
+function encodes the token, argument types, and argument data to a buffer using
+helpers provided by ``pw_tokenizer/encode_args.h``.
+
+.. doxygenfunction:: pw::tokenizer::EncodeArgs
+.. doxygenclass:: pw::tokenizer::EncodedMessage
+   :members:
+.. doxygenfunction:: pw_tokenizer_EncodeArgs
+
+Tokenizing function names
+=========================
+The string literal tokenization functions support tokenizing string literals or
+constexpr character arrays (``constexpr const char[]``). In GCC and Clang, the
+special ``__func__`` variable and ``__PRETTY_FUNCTION__`` extension are declared
+as ``static constexpr char[]`` in C++ instead of the standard ``static const
+char[]``. This means that ``__func__`` and ``__PRETTY_FUNCTION__`` can be
+tokenized while compiling C++ with GCC or Clang.
+
+.. code-block:: cpp
+
+   // Tokenize the special function name variables.
+   constexpr uint32_t function = PW_TOKENIZE_STRING(__func__);
+   constexpr uint32_t pretty_function = PW_TOKENIZE_STRING(__PRETTY_FUNCTION__);
+
+Note that ``__func__`` and ``__PRETTY_FUNCTION__`` are not string literals.
+They are defined as static character arrays, so they cannot be implicitly
+concatentated with string literals. For example, ``printf(__func__ ": %d",
+123);`` will not compile.
+
+Encoding
+========
+The token is a 32-bit hash calculated during compilation. The string is encoded
+little-endian with the token followed by arguments, if any. For example, the
+31-byte string ``You can go about your business.`` hashes to 0xdac9a244.
+This is encoded as 4 bytes: ``44 a2 c9 da``.
+
+Arguments are encoded as follows:
+
+* **Integers**  (1--10 bytes) --
+  `ZagZag and varint encoded <https://developers.google.com/protocol-buffers/docs/encoding#signed-integers>`_,
+  similarly to Protocol Buffers. Smaller values take fewer bytes.
+* **Floating point numbers** (4 bytes) -- Single precision floating point.
+* **Strings** (1--128 bytes) -- Length byte followed by the string contents.
+  The top bit of the length whether the string was truncated or not. The
+  remaining 7 bits encode the string length, with a maximum of 127 bytes.
+
+.. TODO(hepler): insert diagram here!
+
+.. tip::
+   ``%s`` arguments can quickly fill a tokenization buffer. Keep ``%s``
+   arguments short or avoid encoding them as strings (e.g. encode an enum as an
+   integer instead of a string). See also
+   :ref:`module-pw_tokenizer-tokenized-strings-as-args`.
+
+Buffer sizing helper
+--------------------
+.. doxygenfunction:: pw::tokenizer::MinEncodingBufferSizeBytes
+
+Token generation: fixed length hashing at compile time
+======================================================
+String tokens are generated using a modified version of the x65599 hash used by
+the SDBM project. All hashing is done at compile time.
+
+In C code, strings are hashed with a preprocessor macro. For compatibility with
+macros, the hash must be limited to a fixed maximum number of characters. This
+value is set by ``PW_TOKENIZER_CFG_C_HASH_LENGTH``. Increasing
+``PW_TOKENIZER_CFG_C_HASH_LENGTH`` increases the compilation time for C due to
+the complexity of the hashing macros.
+
+C++ macros use a constexpr function instead of a macro. This function works with
+any length of string and has lower compilation time impact than the C macros.
+For consistency, C++ tokenization uses the same hash algorithm, but the
+calculated values will differ between C and C++ for strings longer than
+``PW_TOKENIZER_CFG_C_HASH_LENGTH`` characters.
+
+Tokenization in Python
+======================
+The Python ``pw_tokenizer.encode`` module has limited support for encoding
+tokenized messages with the ``encode_token_and_args`` function.
+
+.. autofunction:: pw_tokenizer.encode.encode_token_and_args
+
+This function requires a string's token is already calculated. Typically these
+tokens are provided by a database, but they can be manually created using the
+tokenizer hash.
+
+.. autofunction:: pw_tokenizer.tokens.pw_tokenizer_65599_hash
+
+This is particularly useful for offline token database generation in cases where
+tokenized strings in a binary cannot be embedded as parsable pw_tokenizer
+entries.
+
+.. note::
+   In C, the hash length of a string has a fixed limit controlled by
+   ``PW_TOKENIZER_CFG_C_HASH_LENGTH``. To match tokens produced by C (as opposed
+   to C++) code, ``pw_tokenizer_65599_hash()`` should be called with a matching
+   hash length limit. When creating an offline database, it's a good idea to
+   generate tokens for both, and merge the databases.

diff --git a/pw_tokenizer/docs.rst b/pw_tokenizer/docs.rst
index 7d2e1e3..9f2cf99 100644
--- a/pw_tokenizer/docs.rst
+++ b/pw_tokenizer/docs.rst

@@ -54,7 +54,7 @@
 1. Add ``pw_tokenizer`` to your build. Build files for GN, CMake, and Bazel are
    provided. For Make or other build systems, add the files specified in the
    BUILD.gn's ``pw_tokenizer`` target to the build.
-2. Use the tokenization macros in your code. See `Tokenization`_.
+2. Use the tokenization macros in your code. See :ref:`module-pw_tokenizer-api-tokenization`.
 3. Add the contents of ``pw_tokenizer_linker_sections.ld`` to your project's
    linker script. In GN and CMake, this step is done automatically.
 4. Compile your code to produce an ELF file.
@@ -90,126 +90,14 @@
   ``pw_tokenizer_zephyr.ld`` which is added to the end of the linker file
   via a call to ``zephyr_linker_sources(SECTIONS ...)``.
 
-.. _module-pw_tokenizer-api:
-
 ------------
 Tokenization
 ------------
-Tokenization converts a string literal to a token. If it's a printf-style
-string, its arguments are encoded along with it. The results of tokenization can
-be sent off device or stored in place of a full string.
+See :ref:`module-pw_tokenizer-api-tokenization` in the API reference
+for detailed information about the tokenization API.
 
-.. doxygentypedef:: pw_tokenizer_Token
-
-Tokenization macros
-===================
-Adding tokenization to a project is simple. To tokenize a string, include
-``pw_tokenizer/tokenize.h`` and invoke one of the ``PW_TOKENIZE_`` macros.
-
-Tokenize a string literal
--------------------------
-``pw_tokenizer`` provides macros for tokenizing string literals with no
-arguments.
-
-.. doxygendefine:: PW_TOKENIZE_STRING
-.. doxygendefine:: PW_TOKENIZE_STRING_DOMAIN
-.. doxygendefine:: PW_TOKENIZE_STRING_MASK
-
-The tokenization macros above cannot be used inside other expressions.
-
-.. admonition:: **Yes**: Assign :c:macro:`PW_TOKENIZE_STRING` to a ``constexpr`` variable.
-  :class: checkmark
-
-  .. code:: cpp
-
-    constexpr uint32_t kGlobalToken = PW_TOKENIZE_STRING("Wowee Zowee!");
-
-    void Function() {
-      constexpr uint32_t local_token = PW_TOKENIZE_STRING("Wowee Zowee?");
-    }
-
-.. admonition:: **No**: Use :c:macro:`PW_TOKENIZE_STRING` in another expression.
-  :class: error
-
-  .. code:: cpp
-
-   void BadExample() {
-     ProcessToken(PW_TOKENIZE_STRING("This won't compile!"));
-   }
-
-  Use :c:macro:`PW_TOKENIZE_STRING_EXPR` instead.
-
-An alternate set of macros are provided for use inside expressions. These make
-use of lambda functions, so while they can be used inside expressions, they
-require C++ and cannot be assigned to constexpr variables or be used with
-special function variables like ``__func__``.
-
-.. doxygendefine:: PW_TOKENIZE_STRING_EXPR
-.. doxygendefine:: PW_TOKENIZE_STRING_DOMAIN_EXPR
-.. doxygendefine:: PW_TOKENIZE_STRING_MASK_EXPR
-
-.. admonition:: When to use these macros
-
-  Use :c:macro:`PW_TOKENIZE_STRING` and related macros to tokenize string
-  literals that do not need %-style arguments encoded.
-
-.. admonition:: **Yes**: Use :c:macro:`PW_TOKENIZE_STRING_EXPR` within other expressions.
-  :class: checkmark
-
-  .. code:: cpp
-
-    void GoodExample() {
-      ProcessToken(PW_TOKENIZE_STRING_EXPR("This will compile!"));
-    }
-
-.. admonition:: **No**: Assign :c:macro:`PW_TOKENIZE_STRING_EXPR` to a ``constexpr`` variable.
-  :class: error
-
-  .. code:: cpp
-
-     constexpr uint32_t wont_work = PW_TOKENIZE_STRING_EXPR("This won't compile!"));
-
-  Instead, use :c:macro:`PW_TOKENIZE_STRING` to assign to a ``constexpr`` variable.
-
-.. admonition:: **No**: Tokenize ``__func__`` in :c:macro:`PW_TOKENIZE_STRING_EXPR`.
-  :class: error
-
-  .. code:: cpp
-
-    void BadExample() {
-      // This compiles, but __func__ will not be the outer function's name, and
-      // there may be compiler warnings.
-      constexpr uint32_t wont_work = PW_TOKENIZE_STRING_EXPR(__func__);
-    }
-
-  Instead, use :c:macro:`PW_TOKENIZE_STRING` to tokenize ``__func__`` or similar macros.
-
-.. _module-pw_tokenizer-custom-macro:
-
-Tokenize a message with arguments in a custom macro
----------------------------------------------------
-Projects can leverage the tokenization machinery in whichever way best suits
-their needs. The most efficient way to use ``pw_tokenizer`` is to pass tokenized
-data to a global handler function. A project's custom tokenization macro can
-handle tokenized data in a function of their choosing.
-
-``pw_tokenizer`` provides two low-level macros for projects to use
-to create custom tokenization macros.
-
-.. doxygendefine:: PW_TOKENIZE_FORMAT_STRING
-.. doxygendefine:: PW_TOKENIZER_ARG_TYPES
-
-The outputs of these macros are typically passed to an encoding function. That
-function encodes the token, argument types, and argument data to a buffer using
-helpers provided by ``pw_tokenizer/encode_args.h``.
-
-.. doxygenfunction:: pw::tokenizer::EncodeArgs
-.. doxygenclass:: pw::tokenizer::EncodedMessage
-   :members:
-.. doxygenfunction:: pw_tokenizer_EncodeArgs
-
-Example
-^^^^^^^
+Example: tokenize a message with arguments in a custom macro
+============================================================
 The following example implements a custom tokenization macro similar to
 :ref:`module-pw_log_tokenized`.
 
@@ -274,101 +162,13 @@
    - Pass additional arguments, such as metadata, with the tokenized message.
    - Integrate ``pw_tokenizer`` with other systems.
 
-Tokenize a message with arguments to a buffer
----------------------------------------------
-.. doxygendefine:: PW_TOKENIZE_TO_BUFFER
-.. doxygendefine:: PW_TOKENIZE_TO_BUFFER_DOMAIN
-.. doxygendefine:: PW_TOKENIZE_TO_BUFFER_MASK
-
-.. admonition:: Why use this macro
-
-   - Encode a tokenized message for consumption within a function.
-   - Encode a tokenized message into an existing buffer.
-
-   Avoid using ``PW_TOKENIZE_TO_BUFFER`` in widely expanded macros, such as a
-   logging macro, because it will result in larger code size than passing the
-   tokenized data to a function.
-
 Binary logging with pw_tokenizer
 ================================
 String tokenization can be used to convert plain text logs to a compact,
 efficient binary format. See :ref:`module-pw_log_tokenized`.
 
-Tokenizing function names
-=========================
-The string literal tokenization functions support tokenizing string literals or
-constexpr character arrays (``constexpr const char[]``). In GCC and Clang, the
-special ``__func__`` variable and ``__PRETTY_FUNCTION__`` extension are declared
-as ``static constexpr char[]`` in C++ instead of the standard ``static const
-char[]``. This means that ``__func__`` and ``__PRETTY_FUNCTION__`` can be
-tokenized while compiling C++ with GCC or Clang.
-
-.. code-block:: cpp
-
-   // Tokenize the special function name variables.
-   constexpr uint32_t function = PW_TOKENIZE_STRING(__func__);
-   constexpr uint32_t pretty_function = PW_TOKENIZE_STRING(__PRETTY_FUNCTION__);
-
-Note that ``__func__`` and ``__PRETTY_FUNCTION__`` are not string literals.
-They are defined as static character arrays, so they cannot be implicitly
-concatentated with string literals. For example, ``printf(__func__ ": %d",
-123);`` will not compile.
-
-Tokenization in Python
-======================
-The Python ``pw_tokenizer.encode`` module has limited support for encoding
-tokenized messages with the ``encode_token_and_args`` function.
-
-.. autofunction:: pw_tokenizer.encode.encode_token_and_args
-
-This function requires a string's token is already calculated. Typically these
-tokens are provided by a database, but they can be manually created using the
-tokenizer hash.
-
-.. autofunction:: pw_tokenizer.tokens.pw_tokenizer_65599_hash
-
-This is particularly useful for offline token database generation in cases where
-tokenized strings in a binary cannot be embedded as parsable pw_tokenizer
-entries.
-
-.. note::
-   In C, the hash length of a string has a fixed limit controlled by
-   ``PW_TOKENIZER_CFG_C_HASH_LENGTH``. To match tokens produced by C (as opposed
-   to C++) code, ``pw_tokenizer_65599_hash()`` should be called with a matching
-   hash length limit. When creating an offline database, it's a good idea to
-   generate tokens for both, and merge the databases.
-
-Encoding
-========
-The token is a 32-bit hash calculated during compilation. The string is encoded
-little-endian with the token followed by arguments, if any. For example, the
-31-byte string ``You can go about your business.`` hashes to 0xdac9a244.
-This is encoded as 4 bytes: ``44 a2 c9 da``.
-
-Arguments are encoded as follows:
-
-* **Integers**  (1--10 bytes) --
-  `ZagZag and varint encoded <https://developers.google.com/protocol-buffers/docs/encoding#signed-integers>`_,
-  similarly to Protocol Buffers. Smaller values take fewer bytes.
-* **Floating point numbers** (4 bytes) -- Single precision floating point.
-* **Strings** (1--128 bytes) -- Length byte followed by the string contents.
-  The top bit of the length whether the string was truncated or not. The
-  remaining 7 bits encode the string length, with a maximum of 127 bytes.
-
-.. TODO(hepler): insert diagram here!
-
-.. tip::
-   ``%s`` arguments can quickly fill a tokenization buffer. Keep ``%s``
-   arguments short or avoid encoding them as strings (e.g. encode an enum as an
-   integer instead of a string). See also
-   :ref:`module-pw_tokenizer-tokenized-strings-as-args`.
-
-Buffer sizing helper
---------------------
-.. doxygenfunction:: pw::tokenizer::MinEncodingBufferSizeBytes
-
 Encoding command line utility
------------------------------
+=============================
 The ``pw_tokenizer.encode`` command line tool can be used to encode tokenized
 strings.
 
@@ -389,23 +189,6 @@
 
 See ``--help`` for full usage details.
 
-Token generation: fixed length hashing at compile time
-======================================================
-String tokens are generated using a modified version of the x65599 hash used by
-the SDBM project. All hashing is done at compile time.
-
-In C code, strings are hashed with a preprocessor macro. For compatibility with
-macros, the hash must be limited to a fixed maximum number of characters. This
-value is set by ``PW_TOKENIZER_CFG_C_HASH_LENGTH``. Increasing
-``PW_TOKENIZER_CFG_C_HASH_LENGTH`` increases the compilation time for C due to
-the complexity of the hashing macros.
-
-C++ macros use a constexpr function instead of a macro. This function works with
-any length of string and has lower compilation time impact than the C macros.
-For consistency, C++ tokenization uses the same hash algorithm, but the
-calculated values will differ between C and C++ for strings longer than
-``PW_TOKENIZER_CFG_C_HASH_LENGTH`` characters.
-
 .. _module-pw_tokenizer-domains:
 
 Tokenization domains
@@ -1054,4 +837,5 @@
    :hidden:
    :maxdepth: 1
 
+   api
    design
commit	2b5581f419e8cff765ee12a0952be3f1f55c6f0a	[log] [tgz]
author	Kayce Basques <kayce@google.com>	Fri Jun 16 17:54:41 2023 +0000
committer	CQ Bot Account <pigweed-scoped@luci-project-accounts.iam.gserviceaccount.com>	Fri Jun 16 17:54:41 2023 +0000
tree	2cb345dc71f55c93105637e2ab523709a82bafea
parent	4e943da80245a5731c4ffab6cc90fd9af8fe6f67 [diff]