pw_protobuf/docs.rst - pigweed/pigweed - Git at Google

 .. _module-pw_protobuf:

 ===========
 pw_protobuf
 ===========
 The protobuf module provides a lightweight interface for encoding and decoding
 the Protocol Buffer wire format.

 .. note::

   The protobuf module is a work in progress. Wire format encoding and decoding
   is supported, though the APIs are not final. C++ code generation exists for
   encoding, but not decoding.

 Design
 ======
 Unlike other protobuf libraries, which typically provide in-memory data
 structures to represent protobuf messages, ``pw_protobuf`` operates directly on
 the wire format and leaves data storage to the user. This has a few benefits.
 The primary one is that it allows the library to be incredibly small, with the
 encoder and decoder each having a code size of around 1.5K and negligible RAM
 usage. Users can choose the tradeoffs most suitable for their product on top of
 this core implementation.

 ``pw_protobuf`` also provides zero-overhead C++ code generation which wraps its
 low-level wire format operations with a user-friendly API for processing
 specific protobuf messages. The code generation integrates with Pigweed's GN
 build system.

 Configuration
 =============
 ``pw_protobuf`` supports the following configuration options.

 * ``PW_PROTOBUF_CFG_MAX_VARINT_SIZE``:
   When encoding nested messages, the number of bytes to reserve for the varint
   submessage length. Nested messages are limited in size to the maximum value
   that can be varint-encoded into this reserved space.

   The values that can be set, and their corresponding maximum submessage
   lengths, are outlined below.

   +-------------------+----------------------------------------+
   | MAX_VARINT_SIZE   | Maximum submessage length              |
   +===================+========================================+
   | 1 byte            | 127                                    |
   +-------------------+----------------------------------------+
   | 2 bytes           | 16,383 or < 16KiB                      |
   +-------------------+----------------------------------------+
   | 3 bytes           | 2,097,151 or < 2048KiB                 |
   +-------------------+----------------------------------------+
   | 4 bytes (default) | 268,435,455 or < 256MiB                |
   +-------------------+----------------------------------------+
   | 5 bytes           | 4,294,967,295 or < 4GiB (max uint32_t) |
   +-------------------+----------------------------------------+

 ========
 Encoding
 ========

 Usage
 =====
 Pigweed's protobuf encoders encode directly to the wire format of a proto rather
 than staging information to a mutable datastructure. This means any writes of a
 value are final, and can't be referenced or modified as a later step in the
 encode process.

 MemoryEncoder
 =============
 A MemoryEncoder directly encodes a proto to an in-memory buffer.

 .. Code:: cpp

   // Writes a proto response to the provided buffer, returning the encode
   // status and number of bytes written.
   StatusWithSize WriteProtoResponse(ByteSpan response) {
     // All proto writes are directly written to the `response` buffer.
     MemoryEncoder encoder(response);
     encoder.WriteUint32(kMagicNumberField, 0x1a1a2b2b);
     encoder.WriteString(kFavoriteFood, "cookies");
     return StatusWithSize(encoder.status(), encoder.size());
   }

 StreamEncoder
 =============
 pw_protobuf's StreamEncoder class operates on pw::stream::Writer objects to
 serialized proto data. This means you can directly encode a proto to something
 like pw::sys_io without needing to build the complete message in memory first.

 .. Code:: cpp

   #include "pw_protobuf/encoder.h"
   #include "pw_stream/sys_io_stream.h"
   #include "pw_bytes/span.h"

   pw::stream::SysIoWriter sys_io_writer;
   pw::protobuf::StreamEncoder my_proto_encoder(sys_io_writer,
                                                   pw::ByteSpan());

   // Once this line returns, the field has been written to the Writer.
   my_proto_encoder.WriteInt64(kTimestampFieldNumber, system::GetUnixEpoch());

   // There's no intermediate buffering when writing a string directly to a
   // StreamEncoder.
   my_proto_encoder.WriteString(kWelcomeMessageFieldNumber,
                                "Welcome to Pigweed!");
   if (!my_proto_encoder.status().ok()) {
     PW_LOG_INFO("Failed to encode proto; %s", my_proto_encoder.status().str());
   }

 Nested submessages
 ==================
 Writing proto messages with nested submessages requires buffering due to
 limitations of the proto format. Every proto submessage must know the size of
 the submessage before its final serialization can begin. A streaming encoder can
 be passed a scratch buffer to use when constructing nested messages. All
 submessage data is buffered to this scratch buffer until the submessage is
 finalized. Note that the contents of this scratch buffer is not necessarily
 valid proto data, so don't try to use it directly.

 MemoryEncoder objects use the final destination buffer rather than relying on a
 scratch buffer. Note that this means your destination buffer might need
 additional space for overhead incurred by nesting submessages. The
 ``MaxScratchBufferSize()`` helper function can be useful in estimating how much
 space to allocate to account for nested submessage encoding overhead.

 .. Code:: cpp

   #include "pw_protobuf/encoder.h"
   #include "pw_stream/sys_io_stream.h"
   #include "pw_bytes/span.h"

   pw::stream::SysIoWriter sys_io_writer;
   // The scratch buffer should be at least as big as the largest nested
   // submessage. It's a good idea to be a little generous.
   std::byte submessage_scratch_buffer[64];

   // Provide the scratch buffer to the proto encoder. The buffer's lifetime must
   // match the lifetime of the encoder.
   pw::protobuf::StreamEncoder my_proto_encoder(sys_io_writer,
                                                submessage_scratch_buffer);

   {
     // Note that the parent encoder, my_proto_encoder, cannot be used until the
     // nested encoder, nested_encoder, has been destroyed.
     StreamEncoder nested_encoder =
         my_proto_encoder.GetNestedEncoder(kPetsFieldNumber);

     // There's intermediate buffering when writing to a nested encoder.
     nested_encoder.WriteString(kNameFieldNumber, "Spot");
     nested_encoder.WriteString(kPetTypeFieldNumber, "dog");

     // When this scope ends, the nested encoder is serialized to the Writer.
     // In addition, the parent encoder, my_proto_encoder, can be used again.
   }

   // If an encode error occurs when encoding the nested messages, it will be
   // reflected at the root encoder.
   if (!my_proto_encoder.status().ok()) {
     PW_LOG_INFO("Failed to encode proto; %s", my_proto_encoder.status().str());
   }

 .. warning::
   When a nested submessage is created, any use of the parent encoder that
   created the nested encoder will trigger a crash. To resume using the parent
   encoder, destroy the submessage encoder first.

 Error Handling
 ==============
 While individual write calls on a proto encoder return pw::Status objects, the
 encoder tracks all status returns and "latches" onto the first error
 encountered. This status can be accessed via ``StreamEncoder::status()``.

 Codegen
 =======
 pw_protobuf encoder codegen integration is supported in GN, Bazel, and CMake.
 The codegen is just a light wrapper around the ``StreamEncoder`` and
 ``MemoryEncoder`` objects, providing named helper functions to write proto
 fields rather than requiring that field numbers are directly passed to an
 encoder. Namespaced proto enums are also generated, and used as the arguments
 when writing enum fields of a proto message.

 All generated messages provide a ``Fields`` enum that can be used directly for
 out-of-band encoding, or with the ``pw::protobuf::Decoder``.

 This module's codegen is available through the ``*.pwpb`` sub-target of a
 ``pw_proto_library`` in GN, CMake, and Bazel. See :ref:`pw_protobuf_compiler's
 documentation <module-pw_protobuf_compiler>` for more information on build
 system integration for pw_protobuf codegen.

 Example ``BUILD.gn``:

 .. Code:: none

   import("//build_overrides/pigweed.gni")

   import("$dir_pw_build/target_types.gni")
   import("$dir_pw_protobuf_compiler/proto.gni")

   # This target controls where the *.pwpb.h headers end up on the include path.
   # In this example, it's at "pet_daycare_protos/client.pwpb.h".
   pw_proto_library("pet_daycare_protos") {
     sources = [
       "pet_daycare_protos/client.proto",
     ]
   }

   pw_source_set("example_client") {
     sources = [ "example_client.cc" ]
     deps = [
       ":pet_daycare_protos.pwpb",
       dir_pw_bytes,
       dir_pw_stream,
     ]
   }

 Example ``pet_daycare_protos/client.proto``:

 .. Code:: none

   syntax = "proto3";
   // The proto package controls the namespacing of the codegen. If this package
   // were fuzzy.friends, the namespace for codegen would be fuzzy::friends::*.
   package fuzzy_friends;

   message Pet {
     string name = 1;
     string pet_type = 2;
   }

   message Client {
     repeated Pet pets = 1;
   }

 Example ``example_client.cc``:

 .. Code:: cpp

   #include "pet_daycare_protos/client.pwpb.h"
   #include "pw_protobuf/encoder.h"
   #include "pw_stream/sys_io_stream.h"
   #include "pw_bytes/span.h"

   pw::stream::SysIoWriter sys_io_writer;
   std::byte submessage_scratch_buffer[64];
   // The constructor is the same as a pw::protobuf::StreamEncoder.
   fuzzy_friends::Client::StreamEncoder client(sys_io_writer,
                                               submessage_scratch_buffer);
   {
     fuzzy_friends::Pet::StreamEncoder pet1 = client.GetPetsEncoder();
     pet1.WriteName("Spot");
     pet1.WritePetType("dog");
   }

   {
     fuzzy_friends::Pet::StreamEncoder pet2 = client.GetPetsEncoder();
     pet2.WriteName("Slippers");
     pet2.WritePetType("rabbit");
   }

   if (!client.status().ok()) {
     PW_LOG_INFO("Failed to encode proto; %s", client.status().str());
   }

 ========
 Decoding
 ========
 ``pw_protobuf`` provides two decoder implementations, which are described below.

 Decoder
 =======
 The ``Decoder`` class operates on an protobuf message located in a buffer in
 memory. It provides an iterator-style API for processing a message. Calling
 ``Next()`` advances the decoder to the next proto field, which can then be read
 by calling the appropriate ``Read*`` function for the field number.

 When reading ``bytes`` and ``string`` fields, the decoder returns a view of that
 field within the buffer; no data is copied out.

 .. note::

   ``pw::protobuf::Decoder`` will soon be renamed ``pw::protobuf::MemoryDecoder``
   for clarity and consistency.

 .. code-block:: c++

   #include "pw_protobuf/decoder.h"
   #include "pw_status/try.h"

   pw::Status DecodeProtoFromBuffer(std::span<const std::byte> buffer) {
     pw::protobuf::Decoder decoder(buffer);
     pw::Status status;

     uint32_t uint32_field;
     std::string_view string_field;

     // Iterate over the fields in the message. A return value of OK indicates
     // that a valid field has been found and can be read. When the decoder
     // reaches the end of the message, Next() will return OUT_OF_RANGE.
     // Other return values indicate an error trying to decode the message.
     while ((status = decoder.Next()).ok()) {
       switch (decoder.FieldNumber()) {
         case 1:
           PW_TRY(decoder.ReadUint32(&uint32_field));
           break;
         case 2:
           // The passed-in string_view will point to the contents of the string
           // field within the buffer.
           PW_TRY(decoder.ReadString(&string_field));
           break;
       }
     }

     // Do something with the fields...

     return status.IsOutOfRange() ? OkStatus() : status;
   }

 StreamDecoder
 =============
 Sometimes, a serialized protobuf message may be too large to fit into an
 in-memory buffer. To faciliate working with that type of data, ``pw_protobuf``
 provides a ``StreamDecoder`` which reads data from a
 ``pw::stream::SeekableReader``.

 .. admonition:: When to use a stream decoder

   The ``StreamDecoder`` should only be used in cases where the protobuf data
   cannot be read directly from a buffer. It is unadvisable to use a
   ``StreamDecoder`` with a ``MemoryStream`` --- the decoding operations will be
   far less efficient than the ``Decoder``, which is optimized for in-memory
   messages.

 The general usage of a ``StreamDecoder`` is similar to the basic ``Decoder``,
 with the exception of ``bytes`` and ``string`` fields, which must be copied out
 of the stream into a provided buffer.

 .. code-block:: c++

   #include "pw_protobuf/decoder.h"
   #include "pw_status/try.h"

   pw::Status DecodeProtoFromStream(pw::stream::SeekableReader& reader) {
     pw::protobuf::StreamDecoder decoder(reader);
     pw::Status status;

     uint32_t uint32_field;
     char string_field[16];

     // Iterate over the fields in the message. A return value of OK indicates
     // that a valid field has been found and can be read. When the decoder
     // reaches the end of the message, Next() will return OUT_OF_RANGE.
     // Other return values indicate an error trying to decode the message.
     while ((status = decoder.Next()).ok()) {
       // FieldNumber() returns a Result<uint32_t> as it may fail sometimes.
       // However, FieldNumber() is guaranteed to be valid after a call to Next()
       // that returns OK, so the value can be used directly here.
       switch (decoder.FieldNumber().value()) {
         case 1: {
           Result<uint32_t> result = decoder.ReadUint32();
           if (result.ok()) {
             uint32_field = result.value();
           }
           break;
         }

         case 2:
           // The string field is copied into the provided buffer. If the buffer
           // is too small to fit the string, RESOURCE_EXHAUSTED is returned and
           // the decoder is not advanced, allowing the field to be re-read.
           PW_TRY(decoder.ReadString(string_field));
           break;
       }
     }

     // Do something with the fields...

     return status.IsOutOfRange() ? OkStatus() : status;
   }

 The ``StreamDecoder`` can also return a ``Stream::SeekableReader`` for reading
 bytes fields, avoiding the need to copy data out directly.

 .. code-block:: c++

   if (decoder.FieldNumber() == 3) {
     // bytes my_bytes_field = 3;
     pw::protobuf::StreamDecoder::BytesReader bytes_reader =
         decoder.GetBytesReader();

     // Read data incrementally through the bytes_reader. While the reader is
     // active, any attempts to use the decoder will result in a crash. When the
     // reader goes out of scope, it will close itself and reactive the decoder.
   }

 If the current field is a nested protobuf message, the ``StreamDecoder`` can
 provide a decoder for the nested message. While the nested decoder is active,
 its parent decoder cannot be used.

 .. code-block:: c++

   if (decoder.FieldNumber() == 4) {
     pw::protobuf::StreamDecoder nested_decoder = decoder.GetNestedDecoder();

     while (nested_decoder.Next().ok()) {
       // Process the nested message.
     }

     // Once the nested decoder goes out of scope, it closes itself, and the
     // parent decoder can be used again.
   }

 Proto map encoding utils
 ========================

 Some additional helpers for encoding more complex but common protobuf
 submessages (e.g. map<string, bytes>) are provided in
 ``pw_protobuf/map_utils.h``.

 .. Note::
   The helper API are currently in-development and may not remain stable.

 Message
 =======

 The module implements a message parsing class ``Message``, in
 ``pw_protobuf/message.h``, to faciliate proto message parsing and field access.
 The class provides interfaces for searching fields in a proto message and
 creating helper classes for it according to its interpreted field type, i.e.
 uint32, bytes, string, map<>, repeated etc. The class works on top of
 ``StreamDecoder`` and thus requires a ``pw::stream::SeekableReader`` for proto
 message access. The following gives examples for using the class to process
 different fields in a proto message:

 .. code-block:: c++

   // Consider the proto messages defined as follows:
   //
   // message Nested {
   //   string nested_str = 1;
   //   bytes nested_bytes = 2;
   // }
   //
   // message {
   //   uint32 integer = 1;
   //   string str = 2;
   //   bytes bytes = 3;
   //   Nested nested = 4;
   //   repeated string rep_str = 5;
   //   repeated Nested rep_nested  = 6;
   //   map<string, bytes> str_to_bytes = 7;
   //   map<string, Nested> str_to_nested = 8;
   // }

   // Given a seekable `reader` that reads the top-level proto message, and
   // a <proto_size> that gives the size of the proto message:
   Message message(reader, proto_size);

   // Parse a proto integer field
   Uint32 integer = messasge_parser.AsUint32(1);
   if (!integer.ok()) {
     // handle parsing error. i.e. return integer.status().
   }
   uint32_t integer_value = integer.value(); // obtained the value

   // Parse a string field
   String str = message.AsString(2);
   if (!str.ok()) {
     // handle parsing error. i.e. return str.status();
   }

   // check string equal
   Result<bool> str_check = str.Equal("foo");

   // Parse a bytes field
   Bytes bytes = message.AsBytes(3);
   if (!bytes.ok()) {
     // handle parsing error. i.e. return bytes.status();
   }

   // Get a reader to the bytes.
   stream::IntervalReader bytes_reader = bytes.GetBytesReader();

   // Parse nested message `Nested nested = 4;`
   Message nested = message.AsMessage(4).
   // Get the fields in the nested message.
   String nested_str = nested.AsString(1);
   Bytes nested_bytes = nested.AsBytes(2);

   // Parse repeated field `repeated string rep_str = 5;`
   RepeatedStrings rep_str = message.AsRepeatedString(5);
   // Iterate through the entries. For iteration
   for (String element : rep_str) {
     // Process str
   }

   // Parse repeated field `repeated Nested rep_nested = 6;`
   RepeatedStrings rep_str = message.AsRepeatedString(6);
   // Iterate through the entries. For iteration
   for (Message element : rep_rep_nestedstr) {
     // Process element
   }

   // Parse map field `map<string, bytes> str_to_bytes = 7;`
   StringToBytesMap str_to_bytes = message.AsStringToBytesMap(7);
   // Access the entry by a given key value
   Bytes bytes_for_key = str_to_bytes["key"];
   // Or iterate through map entries
   for (StringToBytesMapEntry entry : str_to_bytes) {
     String key = entry.Key();
     Bytes value = entry.Value();
     // process entry
   }

   // Parse map field `map<string, Nested> str_to_nested = 8;`
   StringToMessageMap str_to_nested = message.AsStringToBytesMap(8);
   // Access the entry by a given key value
   Message nested_for_key = str_to_nested["key"];
   // Or iterate through map entries
   for (StringToMessageMapEntry entry : str_to_nested) {
     String key = entry.Key();
     Message value = entry.Value();
     // process entry
   }

 The methods in ``Message`` for parsing a single field, i.e. everty `AsXXX()`
 method except AsRepeatedXXX() and AsStringMapXXX(), internally performs a
 linear scan of the entire proto message to find the field with the given
 field number. This can be expensive if performed multiple times, especially
 on slow reader. The same applies to the ``operator[]`` of StringToXXXXMap
 helper class. Therefore, for performance consideration, whenever possible, it
 is recommended to use the following for-range style to iterate and process
 single fields directly.


 .. code-block:: c++

   for (Message::Field field : message) {
     if (field.field_number() == 1) {
       Uint32 integer = field.As<Uint32>();
       ...
     } else if (field.field_number() == 2) {
       String str = field.As<String>();
       ...
     } else if (field.field_number() == 3) {
       Bytes bytes = field.As<Bytes>();
       ...
     } else if (field.field_number() == 4) {
       Message nested = field.As<Message>();
       ...
     }
   }


 .. Note::
   The helper API are currently in-development and may not remain stable.

 Size report
 ===========

 Full size report
 ----------------

 This report demonstrates the size of using the entire decoder with all of its
 decode methods and a decode callback for a proto message containing each of the
 protobuf field types.

 .. include:: size_report/decoder_full


 Incremental size report
 -----------------------

 This report is generated using the full report as a base and adding some int32
 fields to the decode callback to demonstrate the incremental cost of decoding
 fields in a message.

 .. include:: size_report/decoder_incremental

 ==========================
 Available protobuf modules
 ==========================
 There are a handful of messages ready to be used in Pigweed projects. These are
 located in ``pw_protobuf/pw_protobuf_protos``.

 common.proto
 ============
 Contains Empty message proto used in many RPC calls.


 status.proto
 ============
 Contains the enum for pw::Status.

 .. Note::
  ``pw::protobuf::StatusCode`` values should not be used outside of a .proto
  file. Instead, the StatusCodes should be converted to the Status type in the
  language. In C++, this would be:

   .. code-block:: c++

     // Reading from a proto
     pw::Status status = static_cast<pw::Status::Code>(proto.status_field));
     // Writing to a proto
     proto.status_field = static_cast<pw::protobuf::StatusCode>(status.code()));

 ========================================
 Comparison with other protobuf libraries
 ========================================

 protobuf-lite
 =============
 protobuf-lite is the official reduced-size C++ implementation of protobuf. It
 uses a restricted subset of the protobuf library's features to minimize code
 size. However, is is still around 150K in size and requires dynamic memory
 allocation, making it unsuitable for many embedded systems.

 nanopb
 ======
 `nanopb <https://github.com/nanopb/nanopb>`_ is a commonly used embedded
 protobuf library with very small code size and full code generation. It provides
 both encoding/decoding functionality and in-memory C structs representing
 protobuf messages.

 nanopb works well for many embedded products; however, using its generated code
 can run into RAM usage issues when processing nontrivial protobuf messages due
 to the necessity of defining a struct capable of storing all configurations of
 the message, which can grow incredibly large. In one project, Pigweed developers
 encountered an 11K struct statically allocated for a single message---over twice
 the size of the final encoded output! (This was what prompted the development of
 ``pw_protobuf``.)

 To avoid this issue, it is possible to use nanopb's low-level encode/decode
 functions to process individual message fields directly, but this loses all of
 the useful semantics of code generation. ``pw_protobuf`` is designed to optimize
 for this use case; it allows for efficient operations on the wire format with an
 intuitive user interface.

 Depending on the requirements of a project, either of these libraries could be
 suitable.