blob: 38bacd0151540c63cf2ffab816cb8ac0f640c5f3 [file] [log] [blame]
.. _module-pw_tokenizer:
============
pw_tokenizer
============
.. pigweed-module::
:name: pw_tokenizer
Logging is critical, but developers are often forced to choose between
additional logging or saving crucial flash space. The ``pw_tokenizer`` module
enables **extensive logging with substantially less memory usage** by replacing
printf-style strings with binary tokens during compilation. It is designed to
integrate easily into existing logging systems.
Although the most common application of ``pw_tokenizer`` is binary logging,
**the tokenizer is general purpose and can be used to tokenize any strings**,
with or without printf-style arguments.
Why tokenize strings?
* **Dramatically reduce binary size** by removing string literals from binaries.
* **Reduce I/O traffic, RAM, and flash usage** by sending and storing compact tokens
instead of strings. We've seen over 50% reduction in encoded log contents.
* **Reduce CPU usage** by replacing snprintf calls with simple tokenization code.
* **Remove potentially sensitive log, assert, and other strings** from binaries.
.. grid:: 1
.. grid-item-card:: :octicon:`rocket` Get started
:link: module-pw_tokenizer-get-started
:link-type: ref
:class-item: sales-pitch-cta-primary
Integrate pw_tokenizer into your project.
.. grid:: 2
.. grid-item-card:: :octicon:`code-square` Tokenization
:link: module-pw_tokenizer-tokenization
:link-type: ref
:class-item: sales-pitch-cta-secondary
Convert strings and arguments to tokens.
.. grid-item-card:: :octicon:`code-square` Token databases
:link: module-pw_tokenizer-token-databases
:link-type: ref
:class-item: sales-pitch-cta-secondary
Store a mapping of tokens to the strings and arguments they represent.
.. grid:: 2
.. grid-item-card:: :octicon:`code-square` Detokenization
:link: module-pw_tokenizer-detokenization
:link-type: ref
:class-item: sales-pitch-cta-secondary
Expand tokens back to the strings and arguments they represent.
.. grid-item-card:: :octicon:`info` API reference
:link: module-pw_tokenizer-api
:link-type: ref
:class-item: sales-pitch-cta-secondary
Detailed reference information about the pw_tokenizer API.
.. _module-pw_tokenizer-tokenized-logging-example:
---------------------------
Tokenized logging in action
---------------------------
Here's an example of how ``pw_tokenizer`` enables you to store
and send the same logging information using significantly less
resources:
.. mermaid::
flowchart TD
subgraph after["After: Tokenized Logs (37 bytes saved!)"]
after_log["LOG(#quot;Battery Voltage: %d mV#quot;, voltage)"] -- 4 bytes stored on-device as... -->
after_encoding["d9 28 47 8e"] -- 6 bytes sent over the wire as... -->
after_transmission["d9 28 47 8e aa 3e"] -- Displayed in logs as... -->
after_display["#quot;Battery Voltage: 3989 mV#quot;"]
end
subgraph before["Before: No Tokenization"]
before_log["LOG(#quot;Battery Voltage: %d mV#quot;, voltage)"] -- 41 bytes stored on-device as... -->
before_encoding["#quot;Battery Voltage: %d mV#quot;"] -- 43 bytes sent over the wire as... -->
before_transmission["#quot;Battery Voltage: 3989 mV#quot;"] -- Displayed in logs as... -->
before_display["#quot;Battery Voltage: 3989 mV#quot;"]
end
style after stroke:#00c852,stroke-width:3px
style before stroke:#ff5252,stroke-width:3px
A quick overview of how the tokenized version works:
* You tokenize ``"Battery Voltage: %d mV"`` with a macro like
:c:macro:`PW_TOKENIZE_STRING`. You can use :ref:`module-pw_log_tokenized`
to handle the tokenization automatically.
* After tokenization, ``"Battery Voltage: %d mV"`` becomes ``d9 28 47 8e``.
* The first 4 bytes sent over the wire is the tokenized version of
``"Battery Voltage: %d mV"``. The last 2 bytes are the value of ``voltage``
converted to a varint using :ref:`module-pw_varint`.
* The logs are converted back to the original, human-readable message
via the :ref:`Detokenization API <module-pw_tokenizer-detokenization>`
and a :ref:`token database <module-pw_tokenizer-token-databases>`.
.. toctree::
:hidden:
:maxdepth: 1
Get started <get_started>
tokenization
token_databases
detokenization
API reference <api>