| .. _module-pw_protobuf: |
| |
| =========== |
| pw_protobuf |
| =========== |
| The protobuf module provides a lightweight interface for encoding and decoding |
| the Protocol Buffer wire format. |
| |
| .. note:: |
| |
| The protobuf module is a work in progress. Wire format encoding and decoding |
| is supported, though the APIs are not final. C++ code generation exists for |
| encoding, but not decoding. |
| |
| Design |
| ====== |
| Unlike other protobuf libraries, which typically provide in-memory data |
| structures to represent protobuf messages, ``pw_protobuf`` operates directly on |
| the wire format and leaves data storage to the user. This has a few benefits. |
| The primary one is that it allows the library to be incredibly small, with the |
| encoder and decoder each having a code size of around 1.5K and negligible RAM |
| usage. Users can choose the tradeoffs most suitable for their product on top of |
| this core implementation. |
| |
| ``pw_protobuf`` also provides zero-overhead C++ code generation which wraps its |
| low-level wire format operations with a user-friendly API for processing |
| specific protobuf messages. The code generation integrates with Pigweed's GN |
| build system. |
| |
| Configuration |
| ============= |
| ``pw_protobuf`` supports the following configuration options. |
| |
| * ``PW_PROTOBUF_CFG_MAX_VARINT_SIZE``: |
| When encoding nested messages, the number of bytes to reserve for the varint |
| submessage length. Nested messages are limited in size to the maximum value |
| that can be varint-encoded into this reserved space. |
| |
| The values that can be set, and their corresponding maximum submessage |
| lengths, are outlined below. |
| |
| +-------------------+----------------------------------------+ |
| | MAX_VARINT_SIZE | Maximum submessage length | |
| +===================+========================================+ |
| | 1 byte | 127 | |
| +-------------------+----------------------------------------+ |
| | 2 bytes | 16,383 or < 16KiB | |
| +-------------------+----------------------------------------+ |
| | 3 bytes | 2,097,151 or < 2048KiB | |
| +-------------------+----------------------------------------+ |
| | 4 bytes (default) | 268,435,455 or < 256MiB | |
| +-------------------+----------------------------------------+ |
| | 5 bytes | 4,294,967,295 or < 4GiB (max uint32_t) | |
| +-------------------+----------------------------------------+ |
| |
| ======== |
| Encoding |
| ======== |
| |
| Usage |
| ===== |
| Pigweed's protobuf encoders encode directly to the wire format of a proto rather |
| than staging information to a mutable datastructure. This means any writes of a |
| value are final, and can't be referenced or modified as a later step in the |
| encode process. |
| |
| MemoryEncoder |
| ============= |
| A MemoryEncoder directly encodes a proto to an in-memory buffer. |
| |
| .. Code:: cpp |
| |
| // Writes a proto response to the provided buffer, returning the encode |
| // status and number of bytes written. |
| StatusWithSize WriteProtoResponse(ByteSpan response) { |
| // All proto writes are directly written to the `response` buffer. |
| MemoryEncoder encoder(response); |
| encoder.WriteUint32(kMagicNumberField, 0x1a1a2b2b); |
| encoder.WriteString(kFavoriteFood, "cookies"); |
| return StatusWithSize(encoder.status(), encoder.size()); |
| } |
| |
| StreamEncoder |
| ============= |
| pw_protobuf's StreamEncoder class operates on pw::stream::Writer objects to |
| serialized proto data. This means you can directly encode a proto to something |
| like pw::sys_io without needing to build the complete message in memory first. |
| |
| .. Code:: cpp |
| |
| #include "pw_protobuf/encoder.h" |
| #include "pw_stream/sys_io_stream.h" |
| #include "pw_bytes/span.h" |
| |
| pw::stream::SysIoWriter sys_io_writer; |
| pw::protobuf::StreamEncoder my_proto_encoder(sys_io_writer, |
| pw::ByteSpan()); |
| |
| // Once this line returns, the field has been written to the Writer. |
| my_proto_encoder.WriteInt64(kTimestampFieldNumber, system::GetUnixEpoch()); |
| |
| // There's no intermediate buffering when writing a string directly to a |
| // StreamEncoder. |
| my_proto_encoder.WriteString(kWelcomeMessageFieldNumber, |
| "Welcome to Pigweed!"); |
| if (!my_proto_encoder.status().ok()) { |
| PW_LOG_INFO("Failed to encode proto; %s", my_proto_encoder.status().str()); |
| } |
| |
| Nested submessages |
| ================== |
| Writing proto messages with nested submessages requires buffering due to |
| limitations of the proto format. Every proto submessage must know the size of |
| the submessage before its final serialization can begin. A streaming encoder can |
| be passed a scratch buffer to use when constructing nested messages. All |
| submessage data is buffered to this scratch buffer until the submessage is |
| finalized. Note that the contents of this scratch buffer is not necessarily |
| valid proto data, so don't try to use it directly. |
| |
| MemoryEncoder objects use the final destination buffer rather than relying on a |
| scratch buffer. Note that this means your destination buffer might need |
| additional space for overhead incurred by nesting submessages. The |
| ``MaxScratchBufferSize()`` helper function can be useful in estimating how much |
| space to allocate to account for nested submessage encoding overhead. |
| |
| .. Code:: cpp |
| |
| #include "pw_protobuf/encoder.h" |
| #include "pw_stream/sys_io_stream.h" |
| #include "pw_bytes/span.h" |
| |
| pw::stream::SysIoWriter sys_io_writer; |
| // The scratch buffer should be at least as big as the largest nested |
| // submessage. It's a good idea to be a little generous. |
| std::byte submessage_scratch_buffer[64]; |
| |
| // Provide the scratch buffer to the proto encoder. The buffer's lifetime must |
| // match the lifetime of the encoder. |
| pw::protobuf::StreamEncoder my_proto_encoder(sys_io_writer, |
| submessage_scratch_buffer); |
| |
| { |
| // Note that the parent encoder, my_proto_encoder, cannot be used until the |
| // nested encoder, nested_encoder, has been destroyed. |
| StreamEncoder nested_encoder = |
| my_proto_encoder.GetNestedEncoder(kPetsFieldNumber); |
| |
| // There's intermediate buffering when writing to a nested encoder. |
| nested_encoder.WriteString(kNameFieldNumber, "Spot"); |
| nested_encoder.WriteString(kPetTypeFieldNumber, "dog"); |
| |
| // When this scope ends, the nested encoder is serialized to the Writer. |
| // In addition, the parent encoder, my_proto_encoder, can be used again. |
| } |
| |
| // If an encode error occurs when encoding the nested messages, it will be |
| // reflected at the root encoder. |
| if (!my_proto_encoder.status().ok()) { |
| PW_LOG_INFO("Failed to encode proto; %s", my_proto_encoder.status().str()); |
| } |
| |
| .. warning:: |
| When a nested submessage is created, any use of the parent encoder that |
| created the nested encoder will trigger a crash. To resume using the parent |
| encoder, destroy the submessage encoder first. |
| |
| Error Handling |
| ============== |
| While individual write calls on a proto encoder return pw::Status objects, the |
| encoder tracks all status returns and "latches" onto the first error |
| encountered. This status can be accessed via ``StreamEncoder::status()``. |
| |
| Codegen |
| ======= |
| pw_protobuf encoder codegen integration is supported in GN, Bazel, and CMake. |
| The codegen is just a light wrapper around the ``StreamEncoder`` and |
| ``MemoryEncoder`` objects, providing named helper functions to write proto |
| fields rather than requiring that field numbers are directly passed to an |
| encoder. Namespaced proto enums are also generated, and used as the arguments |
| when writing enum fields of a proto message. |
| |
| All generated messages provide a ``Fields`` enum that can be used directly for |
| out-of-band encoding, or with the ``pw::protobuf::Decoder``. |
| |
| This module's codegen is available through the ``*.pwpb`` sub-target of a |
| ``pw_proto_library`` in GN, CMake, and Bazel. See :ref:`pw_protobuf_compiler's |
| documentation <module-pw_protobuf_compiler>` for more information on build |
| system integration for pw_protobuf codegen. |
| |
| Example ``BUILD.gn``: |
| |
| .. Code:: none |
| |
| import("//build_overrides/pigweed.gni") |
| |
| import("$dir_pw_build/target_types.gni") |
| import("$dir_pw_protobuf_compiler/proto.gni") |
| |
| # This target controls where the *.pwpb.h headers end up on the include path. |
| # In this example, it's at "pet_daycare_protos/client.pwpb.h". |
| pw_proto_library("pet_daycare_protos") { |
| sources = [ |
| "pet_daycare_protos/client.proto", |
| ] |
| } |
| |
| pw_source_set("example_client") { |
| sources = [ "example_client.cc" ] |
| deps = [ |
| ":pet_daycare_protos.pwpb", |
| dir_pw_bytes, |
| dir_pw_stream, |
| ] |
| } |
| |
| Example ``pet_daycare_protos/client.proto``: |
| |
| .. Code:: none |
| |
| syntax = "proto3"; |
| // The proto package controls the namespacing of the codegen. If this package |
| // were fuzzy.friends, the namespace for codegen would be fuzzy::friends::*. |
| package fuzzy_friends; |
| |
| message Pet { |
| string name = 1; |
| string pet_type = 2; |
| } |
| |
| message Client { |
| repeated Pet pets = 1; |
| } |
| |
| Example ``example_client.cc``: |
| |
| .. Code:: cpp |
| |
| #include "pet_daycare_protos/client.pwpb.h" |
| #include "pw_protobuf/encoder.h" |
| #include "pw_stream/sys_io_stream.h" |
| #include "pw_bytes/span.h" |
| |
| pw::stream::SysIoWriter sys_io_writer; |
| std::byte submessage_scratch_buffer[64]; |
| // The constructor is the same as a pw::protobuf::StreamEncoder. |
| fuzzy_friends::Client::StreamEncoder client(sys_io_writer, |
| submessage_scratch_buffer); |
| { |
| fuzzy_friends::Pet::StreamEncoder pet1 = client.GetPetsEncoder(); |
| pet1.WriteName("Spot"); |
| pet1.WritePetType("dog"); |
| } |
| |
| { |
| fuzzy_friends::Pet::StreamEncoder pet2 = client.GetPetsEncoder(); |
| pet2.WriteName("Slippers"); |
| pet2.WritePetType("rabbit"); |
| } |
| |
| if (!client.status().ok()) { |
| PW_LOG_INFO("Failed to encode proto; %s", client.status().str()); |
| } |
| |
| ======== |
| Decoding |
| ======== |
| ``pw_protobuf`` provides two decoder implementations, which are described below. |
| |
| Decoder |
| ======= |
| The ``Decoder`` class operates on an protobuf message located in a buffer in |
| memory. It provides an iterator-style API for processing a message. Calling |
| ``Next()`` advances the decoder to the next proto field, which can then be read |
| by calling the appropriate ``Read*`` function for the field number. |
| |
| When reading ``bytes`` and ``string`` fields, the decoder returns a view of that |
| field within the buffer; no data is copied out. |
| |
| .. note:: |
| |
| ``pw::protobuf::Decoder`` will soon be renamed ``pw::protobuf::MemoryDecoder`` |
| for clarity and consistency. |
| |
| .. code-block:: c++ |
| |
| #include "pw_protobuf/decoder.h" |
| #include "pw_status/try.h" |
| |
| pw::Status DecodeProtoFromBuffer(std::span<const std::byte> buffer) { |
| pw::protobuf::Decoder decoder(buffer); |
| pw::Status status; |
| |
| uint32_t uint32_field; |
| std::string_view string_field; |
| |
| // Iterate over the fields in the message. A return value of OK indicates |
| // that a valid field has been found and can be read. When the decoder |
| // reaches the end of the message, Next() will return OUT_OF_RANGE. |
| // Other return values indicate an error trying to decode the message. |
| while ((status = decoder.Next()).ok()) { |
| switch (decoder.FieldNumber()) { |
| case 1: |
| PW_TRY(decoder.ReadUint32(&uint32_field)); |
| break; |
| case 2: |
| // The passed-in string_view will point to the contents of the string |
| // field within the buffer. |
| PW_TRY(decoder.ReadString(&string_field)); |
| break; |
| } |
| } |
| |
| // Do something with the fields... |
| |
| return status.IsOutOfRange() ? OkStatus() : status; |
| } |
| |
| StreamDecoder |
| ============= |
| Sometimes, a serialized protobuf message may be too large to fit into an |
| in-memory buffer. To faciliate working with that type of data, ``pw_protobuf`` |
| provides a ``StreamDecoder`` which reads data from a |
| ``pw::stream::SeekableReader``. |
| |
| .. admonition:: When to use a stream decoder |
| |
| The ``StreamDecoder`` should only be used in cases where the protobuf data |
| cannot be read directly from a buffer. It is unadvisable to use a |
| ``StreamDecoder`` with a ``MemoryStream`` --- the decoding operations will be |
| far less efficient than the ``Decoder``, which is optimized for in-memory |
| messages. |
| |
| The general usage of a ``StreamDecoder`` is similar to the basic ``Decoder``, |
| with the exception of ``bytes`` and ``string`` fields, which must be copied out |
| of the stream into a provided buffer. |
| |
| .. code-block:: c++ |
| |
| #include "pw_protobuf/decoder.h" |
| #include "pw_status/try.h" |
| |
| pw::Status DecodeProtoFromStream(pw::stream::SeekableReader& reader) { |
| pw::protobuf::StreamDecoder decoder(reader); |
| pw::Status status; |
| |
| uint32_t uint32_field; |
| char string_field[16]; |
| |
| // Iterate over the fields in the message. A return value of OK indicates |
| // that a valid field has been found and can be read. When the decoder |
| // reaches the end of the message, Next() will return OUT_OF_RANGE. |
| // Other return values indicate an error trying to decode the message. |
| while ((status = decoder.Next()).ok()) { |
| // FieldNumber() returns a Result<uint32_t> as it may fail sometimes. |
| // However, FieldNumber() is guaranteed to be valid after a call to Next() |
| // that returns OK, so the value can be used directly here. |
| switch (decoder.FieldNumber().value()) { |
| case 1: { |
| Result<uint32_t> result = decoder.ReadUint32(); |
| if (result.ok()) { |
| uint32_field = result.value(); |
| } |
| break; |
| } |
| |
| case 2: |
| // The string field is copied into the provided buffer. If the buffer |
| // is too small to fit the string, RESOURCE_EXHAUSTED is returned and |
| // the decoder is not advanced, allowing the field to be re-read. |
| PW_TRY(decoder.ReadString(string_field)); |
| break; |
| } |
| } |
| |
| // Do something with the fields... |
| |
| return status.IsOutOfRange() ? OkStatus() : status; |
| } |
| |
| The ``StreamDecoder`` can also return a ``Stream::SeekableReader`` for reading |
| bytes fields, avoiding the need to copy data out directly. |
| |
| .. code-block:: c++ |
| |
| if (decoder.FieldNumber() == 3) { |
| // bytes my_bytes_field = 3; |
| pw::protobuf::StreamDecoder::BytesReader bytes_reader = |
| decoder.GetBytesReader(); |
| |
| // Read data incrementally through the bytes_reader. While the reader is |
| // active, any attempts to use the decoder will result in a crash. When the |
| // reader goes out of scope, it will close itself and reactive the decoder. |
| } |
| |
| If the current field is a nested protobuf message, the ``StreamDecoder`` can |
| provide a decoder for the nested message. While the nested decoder is active, |
| its parent decoder cannot be used. |
| |
| .. code-block:: c++ |
| |
| if (decoder.FieldNumber() == 4) { |
| pw::protobuf::StreamDecoder nested_decoder = decoder.GetNestedDecoder(); |
| |
| while (nested_decoder.Next().ok()) { |
| // Process the nested message. |
| } |
| |
| // Once the nested decoder goes out of scope, it closes itself, and the |
| // parent decoder can be used again. |
| } |
| |
| Proto map encoding utils |
| ======================== |
| |
| Some additional helpers for encoding more complex but common protobuf |
| submessages (e.g. map<string, bytes>) are provided in |
| ``pw_protobuf/map_utils.h``. |
| |
| .. Note:: |
| The helper API are currently in-development and may not remain stable. |
| |
| Message |
| ======= |
| |
| The module implements a message parsing class ``Message``, in |
| ``pw_protobuf/message.h``, to faciliate proto message parsing and field access. |
| The class provides interfaces for searching fields in a proto message and |
| creating helper classes for it according to its interpreted field type, i.e. |
| uint32, bytes, string, map<>, repeated etc. The class works on top of |
| ``StreamDecoder`` and thus requires a ``pw::stream::SeekableReader`` for proto |
| message access. The following gives examples for using the class to process |
| different fields in a proto message: |
| |
| .. code-block:: c++ |
| |
| // Consider the proto messages defined as follows: |
| // |
| // message Nested { |
| // string nested_str = 1; |
| // bytes nested_bytes = 2; |
| // } |
| // |
| // message { |
| // uint32 integer = 1; |
| // string str = 2; |
| // bytes bytes = 3; |
| // Nested nested = 4; |
| // repeated string rep_str = 5; |
| // repeated Nested rep_nested = 6; |
| // map<string, bytes> str_to_bytes = 7; |
| // map<string, Nested> str_to_nested = 8; |
| // } |
| |
| // Given a seekable `reader` that reads the top-level proto message, and |
| // a <proto_size> that gives the size of the proto message: |
| Message message(reader, proto_size); |
| |
| // Parse a proto integer field |
| Uint32 integer = messasge_parser.AsUint32(1); |
| if (!integer.ok()) { |
| // handle parsing error. i.e. return integer.status(). |
| } |
| uint32_t integer_value = integer.value(); // obtained the value |
| |
| // Parse a string field |
| String str = message.AsString(2); |
| if (!str.ok()) { |
| // handle parsing error. i.e. return str.status(); |
| } |
| |
| // check string equal |
| Result<bool> str_check = str.Equal("foo"); |
| |
| // Parse a bytes field |
| Bytes bytes = message.AsBytes(3); |
| if (!bytes.ok()) { |
| // handle parsing error. i.e. return bytes.status(); |
| } |
| |
| // Get a reader to the bytes. |
| stream::IntervalReader bytes_reader = bytes.GetBytesReader(); |
| |
| // Parse nested message `Nested nested = 4;` |
| Message nested = message.AsMessage(4). |
| // Get the fields in the nested message. |
| String nested_str = nested.AsString(1); |
| Bytes nested_bytes = nested.AsBytes(2); |
| |
| // Parse repeated field `repeated string rep_str = 5;` |
| RepeatedStrings rep_str = message.AsRepeatedString(5); |
| // Iterate through the entries. For iteration |
| for (String element : rep_str) { |
| // Process str |
| } |
| |
| // Parse repeated field `repeated Nested rep_nested = 6;` |
| RepeatedStrings rep_str = message.AsRepeatedString(6); |
| // Iterate through the entries. For iteration |
| for (Message element : rep_rep_nestedstr) { |
| // Process element |
| } |
| |
| // Parse map field `map<string, bytes> str_to_bytes = 7;` |
| StringToBytesMap str_to_bytes = message.AsStringToBytesMap(7); |
| // Access the entry by a given key value |
| Bytes bytes_for_key = str_to_bytes["key"]; |
| // Or iterate through map entries |
| for (StringToBytesMapEntry entry : str_to_bytes) { |
| String key = entry.Key(); |
| Bytes value = entry.Value(); |
| // process entry |
| } |
| |
| // Parse map field `map<string, Nested> str_to_nested = 8;` |
| StringToMessageMap str_to_nested = message.AsStringToBytesMap(8); |
| // Access the entry by a given key value |
| Message nested_for_key = str_to_nested["key"]; |
| // Or iterate through map entries |
| for (StringToMessageMapEntry entry : str_to_nested) { |
| String key = entry.Key(); |
| Message value = entry.Value(); |
| // process entry |
| } |
| |
| The methods in ``Message`` for parsing a single field, i.e. everty `AsXXX()` |
| method except AsRepeatedXXX() and AsStringMapXXX(), internally performs a |
| linear scan of the entire proto message to find the field with the given |
| field number. This can be expensive if performed multiple times, especially |
| on slow reader. The same applies to the ``operator[]`` of StringToXXXXMap |
| helper class. Therefore, for performance consideration, whenever possible, it |
| is recommended to use the following for-range style to iterate and process |
| single fields directly. |
| |
| |
| .. code-block:: c++ |
| |
| for (Message::Field field : message) { |
| if (field.field_number() == 1) { |
| Uint32 integer = field.As<Uint32>(); |
| ... |
| } else if (field.field_number() == 2) { |
| String str = field.As<String>(); |
| ... |
| } else if (field.field_number() == 3) { |
| Bytes bytes = field.As<Bytes>(); |
| ... |
| } else if (field.field_number() == 4) { |
| Message nested = field.As<Message>(); |
| ... |
| } |
| } |
| |
| |
| .. Note:: |
| The helper API are currently in-development and may not remain stable. |
| |
| Size report |
| =========== |
| |
| Full size report |
| ---------------- |
| |
| This report demonstrates the size of using the entire decoder with all of its |
| decode methods and a decode callback for a proto message containing each of the |
| protobuf field types. |
| |
| .. include:: size_report/decoder_full |
| |
| |
| Incremental size report |
| ----------------------- |
| |
| This report is generated using the full report as a base and adding some int32 |
| fields to the decode callback to demonstrate the incremental cost of decoding |
| fields in a message. |
| |
| .. include:: size_report/decoder_incremental |
| |
| ========================== |
| Available protobuf modules |
| ========================== |
| There are a handful of messages ready to be used in Pigweed projects. These are |
| located in ``pw_protobuf/pw_protobuf_protos``. |
| |
| common.proto |
| ============ |
| Contains Empty message proto used in many RPC calls. |
| |
| |
| status.proto |
| ============ |
| Contains the enum for pw::Status. |
| |
| .. Note:: |
| ``pw::protobuf::StatusCode`` values should not be used outside of a .proto |
| file. Instead, the StatusCodes should be converted to the Status type in the |
| language. In C++, this would be: |
| |
| .. code-block:: c++ |
| |
| // Reading from a proto |
| pw::Status status = static_cast<pw::Status::Code>(proto.status_field)); |
| // Writing to a proto |
| proto.status_field = static_cast<pw::protobuf::StatusCode>(status.code())); |
| |
| ======================================== |
| Comparison with other protobuf libraries |
| ======================================== |
| |
| protobuf-lite |
| ============= |
| protobuf-lite is the official reduced-size C++ implementation of protobuf. It |
| uses a restricted subset of the protobuf library's features to minimize code |
| size. However, is is still around 150K in size and requires dynamic memory |
| allocation, making it unsuitable for many embedded systems. |
| |
| nanopb |
| ====== |
| `nanopb <https://github.com/nanopb/nanopb>`_ is a commonly used embedded |
| protobuf library with very small code size and full code generation. It provides |
| both encoding/decoding functionality and in-memory C structs representing |
| protobuf messages. |
| |
| nanopb works well for many embedded products; however, using its generated code |
| can run into RAM usage issues when processing nontrivial protobuf messages due |
| to the necessity of defining a struct capable of storing all configurations of |
| the message, which can grow incredibly large. In one project, Pigweed developers |
| encountered an 11K struct statically allocated for a single message---over twice |
| the size of the final encoded output! (This was what prompted the development of |
| ``pw_protobuf``.) |
| |
| To avoid this issue, it is possible to use nanopb's low-level encode/decode |
| functions to process individual message fields directly, but this loses all of |
| the useful semantics of code generation. ``pw_protobuf`` is designed to optimize |
| for this use case; it allows for efficient operations on the wire format with an |
| intuitive user interface. |
| |
| Depending on the requirements of a project, either of these libraries could be |
| suitable. |