Author: @mcy
How to use Protobuf Editions to construct a large-scale change that modifies the semantics of Protobuf in some way.
This document describes how to use the Protobuf Editions mechanism (both editions, themselves, and features) for designing migrations and large-scale changes intended to solve a particular kind of defect in the language.
This document describes:
protoc
to support large-scale changes.There are two kinds of features:
proto.Features
. In this document, we refer to them as features.<name>
, e.g. features.enum
.features.(<lang>).name
, e.g. features.(proto.cpp).legacy_string
.Global features require a descriptor.h
change, and are relatively heavy weight, since defining one will also require providing helpers in Descriptor
wrapper classes to avoid the need for users to resolve inheritance. Because they are not specific to a language, they need to be carefully, visibility documented.
Language-scoped features require only a change in a backend's feature extension, which has a smaller blast radius (except in C++ and Java). Often these are relevant only for codegen and do not require reflective introspection.
Adding a feature is never a breaking change.
In general, features should have an original default and a desired default: features are intended to gradually flip from one value to another throughout the ecosystem as migrations progress. This is not always true, but this means most features will be bools or enums.
Any migration that introduces a feature should plan to eventually deprecate and remove that feature from both our internal codebase and open source, generally with a multi-year horizon. Features are transient.
Removing a feature is a breaking change, but it does not need to be tied to an edition. Feature removal in OSS must thus be batched into a breaking release. Deletion of a feature should generally be announced to OSS a year in advance.
Here are some things that we could use features for, very broadly:
features.(proto.cpp).legacy_string
.features.packed
, eventually features.group_encoding
.features.enum
, features.utf8
.Although almost any semantic change can be feature-controlled, some things would be a bit tricky to use a feature for:
An edition is a set of default values for all features that protoc
's frontend, and its backends, understand. Edition numbers are announced by protobuf-team, but not necessarily defined by us. protoc
only defines the edition defaults for global features, and each backend defines the edition defaults for its features.
The FileDescriptorProto.edition
field is a string, so that we can avoid nasty surprises around needing to mint multiple editions per year: even if we mint edition = "2022";
, we can mint edition = "2022.1";
in a pinch.
However, protobuf-team does not define editions, it only proclaims them. Third-party backends are responsible for changing defaults across editions. To minimize the amount of synchronization, we introduce a total order on editions.
This means that a backend can pick the default not by looking at the edition, but by asking “is this proto older than this edition, where I introduced this default?”
The total order is thus: the edition string is split on '.'
. Each component is then ordered by a.len < b.len && a < b
. This ensures that 9 < 10
, for example.
By convention, we will make the edition be either the year, like 2022
, or the year followed by a revision, like 2022.1
. Thus, we have the following total ordering on editions:
2022 < 2022.0 < 2022.1 < ... < 2022.9 < 2022.10 < ... < 2023 < ... < 2024 < ...
(Note: The above edition ordering is updated in Edition Naming.)
Thus, if an imaginary Haskell backend defines a feature feature.(haskell).more_monads
, which becomes true in 2023, the backend can ask file.EditionIsLaterThan("2023")
. If it becomes false in 2023.1, a future version would ask file.EditionIsBetween("2023", "2023.1")
.
This means that backends only need to change when they make a change to defaults. However, backends cannot add things to editions willy-nilly. A backend can only start observing an edition after protobuf-team proclaims the next edition number, and may not use edition numbers we do not proclaim.
“Proclamation” is done via a two-step process: first, we announce an upcoming edition some months ahead of time to OSS, and give an approximate date on which we plan to release a non-breaking version that causes protoc to accept the new edition. Around the time of that release, backends should make a release adding support for that edition, if they want to change a default. It is a faux-pas, but ultimately has no enforcement mechanism, for the meaning of an edition to change long (> 1 month) after it has been released.
We promise to proclaim an edition once per calendar year, even if first-party backends will not use it. In the event of an emergency (whatever that means), we can proclaim a Y.1
, Y.2
, and so on. Because of the total order, only backends that desperately need a new edition need to pay attention to the announcement. As we gain experience, we should define guidelines for third parties to request an unscheduled edition bump, but for the time being we will deal with things case-by-case.
We may want to have a canonical way for finding out what the latest edition is. It should be included in large print on our landing page, and protoc --latest-edition
should print the newest edition known to protoc
. The intent is for tooling that wants to generate .proto
templates externally can choose to use the latest edition for new messages.
The following are sketches of large-scale change designs for feature changes we would like to execute, presented as example use-cases.
We need to get the ecosystem into the "editions"
syntax. This migration is probably unique because we are not changing any behavior, just the spelling of a bunch of things.
We also need to track down and upgrade (by hand) any code that is using the value of syntax
. This will likely be a manual large-scale change performed either by Busy Beavers or a handful of protobuf-team members furnished with appropriate stimulants (coffee, diet mountain dew, etc). Once we have migrated 95% of callers of syntax
, we will mark all accessors of that field in various languages as deprecated.
Because the value of syntax
becomes unreliable at this point, this will be a breaking change.
Next, we will introduce the features defined in Edition Zero Features. We will then implement tooling that can take a proto2
or proto3
file and add edition = "2023";
and option features.* = ...;
as appropriate, so that each file retains its original behavior.
This second large-scale change can be fully automated, and does not require breaking changes.
required
We can use features to move fields off of features.field_presence = LEGACY_REQUIRED
(the edition’s spelling of required
) and onto features.field_presence = EXPLICIT_PRESENCE
.
To do this, we introduce a new value for features.field_presence
, ALWAYS_SERIALIZE
, which behaves like EXPLICIT_PRESENCE
, but, if the has-bit is not set, the default is serialized. (This is sort of like a cross between required
and proto3
no-label.)
It is always safe to turn a proto from LEGACY_REQUIRED
to ALWAYS_SERIALIZE
, because required
is a constraint on initialization checking, i.e., that the value was present. This means the only requirement is that old readers not break, which is accomplished by always providing a value. Because required
fields don't set the value anyways, this is not a behavioral change, but it now permits writers to veer off of actually setting the value.
After an appropriate build horizon, we can assume that all readers are tolerant of a potentially missing value (even though no writer would actually be omitting it). At this point we can migrate from ALWAYS_SERIALIZE
to EXPLICIT_PRESENCE
. If a reader does not see a record for the field, attempting to access it will produce the default value; it is not likely that callers are actually checking for presence of required
fields, even though that is technically a thing you can do.
Once all required fields have gone through both steps, LEGACY_REQUIRED
and ALWAYS_SERIALIZE
can be removed as variants (breaking change).
In C++, a string
or bytes
typed field has accessors that produce const std::string&
s. The missed optimizations of doing this are well-understood, so we won't rehash that discussion.
We would like to migrate all of them to return absl::string_view
, a-la ctype = STRING_PIECE
.
To do this, we introduce features.(proto.cpp).legacy_string
[^1], a boolean feature by default true. When false on a field of appropriate type, it does the needful and causes accessors to become representationally opaque.
The feature can be set at file or field scope; tooling (see below) can be used to minimize the diff impact of these changes. Changing a field may also require changing code that was previously assuming they could write std::string x = proto.string_field();
. This has the usual “unspooling string” migration caveats.
Once we have applied 95% of internal changes, we will upgrade the C++ backend at the next edition to default legacy_string
to false in the new edition. Tooling (again, below) can be used to automatically delete explicit settings of the feature throughout our internal codebase, as a second large-scale change. This can happen in parallel to closing the loop on the last 5% of required internal changes.
Once we have eliminated all the legacy accessors, we will remove the feature (breaking change).
It turns out that encoding and decoding groups (end-marker-delimited submessages) is cheaper than handling length-prefixed messages. There are likely CPU and RAM savings in switching messages to use the group encoding. Unfortunately, that would be a wire-breaking change, causing old readers to be unable to parse new messages.
We can do what we did for packed
. First, we modify parsers to accept message fields that are encoded as either groups or messages (i.e., TYPE_MESSAGE
and TYPE_GROUP
become synonyms in the deserializer). We will let this soak for three years[^2] and bide our time.
After those three years, we can begin a large-scale change to add features.group_encoded
to message fields throughout our internal codebase (note that groups don't actually exist in editions; they are just messages with features.group_encoded
). Because of our long waiting period, it is (hopefully) unlikely that old readers will be caught by surprise.
Once we are 95% done, we will upgrade protoc to set features.group_encoded
to true by default in new editions. Tooling can be used to clean up features as before.
We will probably never completely eliminate length-prefixed messages, so this is a rare case where the feature lives on forever.
We will need a few different tools for minimizing migration toil, all of which will be released in OSS. These are:
The features GC. Running protoc --gc-features foo.proto
on a file in editions mode will compute the minimal (or a heuristically minimal, if this proves expensive) set of features to set on things, given the edition specified in the file. This will produce a Protochangifier ProtoChangeSpec
that describes how to clean up the file.
The editions “adopter”. Running protoc --upgrade-edition -I... file.proto
figure out how to update file.proto
from proto2
or proto3
to the latest edition, adding features as necessary. It will emit this information as a ProtoChangeSpec
, implicitly running features GC.
The editions “upgrader”. Running protoc --upgrade-edition
as above on a file that is already in editions mode will bump it up to the latest edition known to protoc
and add features as necessary. Again, this emits a features GC'd ProtoChangeSpec
.
This is by no means all the tooling we need, but it will simplify the work of robots and beavers, along with any bespoke, internal-codebase-specific tooling we build.
We need to export our large-scale changes into open source to have any hope of editions not splitting the ecosystem. It is impossible to do this the way we do large-scale changes in our internal codebase, where we have global approvals and a finite but nonzero supply of bureaucratic sticks to motivate reluctant users.
In OSS, we have neither of these things. The only stick we have is breaking changes, and the only carrots we can offer are new features. There is no “global approval” or “TAP” for OSS.
Our strategy must be a mixture of:
The common theme is comms and making it clear that these are improvements everyone can benefit from, and that there is no “I” in “ecosystem”: using Protobuf, just like using Abseil, means accepting upgrades as a fact of life, not something to be avoided.
We should lean in on lessons learned by Go (see: their go fix
tool) and Rust (see: their rustfix
tool); Rust in particular has an editions/epoch mechanism like we do; they also have feature gates, but those are not the same concept as our features. We should also lean on the Carbon team's public messaging about upgrading being a fact of life, to provide a unified Google front on the matter from the view of observers.
The design of Protobuf Editions is directly inspired by Rust's own edition system[^3].
Rust defines and ships a new edition every three years, and focuses on changes to the surface language that do not inhibit interop: crates of different editions can always be linked together, and “edition” is a parallel ratchet to the language/compiler version.
For example, keywords (like async
) have been introduced using editions. Editions have also been used to change the semantics of the borrow checker to allow new programs, and to change name resolution rules to be more intuitive. For Rust, an edition may require changes to existing code to be able to compile again, but only at the point that the crate opts into the new edition, to obtain some benefit from doing so.
Unlike Protobuf, Rust commits to supporting all past editions in perpetuity: there is no ratcheting forward of the whole ecosystem. However, Rust does ship with rustfix
(runnable on Cargo projects via cargo fix
), a tool that can upgrade crates to a new edition. Edition changes are required to come with a migration plan to enable rustfix
.
Crates therefore have limited pressure to upgrade to the latest edition. It provides better features, but because there is no EOL horizon, crates tend to stay on old editions to support old compilers. For users, this is a great story, and allows old code to work indefinitely. However, there is a maintenance burden on the compiler that old editions and new language features (mostly) work correctly together.
In Rust, macros present a challenge: rich support for interpreted, declarative macros and compiled, fully procedural macros, mean that macros written for older editions may not work well in crates written on newer editions, or vice versa. There are mitigations for this in the compiler, but such fixes cannot be perfect, so this is a source of difficulties in getting total conversion. Protobuf does not have macros, but it does have rich descriptors that mirror input files, and this is a potential source of problems to watch out for.
Overall, Rust‘s migration story is poor: they have accepted they need to support old editions indefinitely, but only produce an edition every three years. Protobuf plans to be much more aggressive, and we should study where Rust’s leniency to old versions is unavoidable and where it is an explicit design choice.
[^1]: ctype
has baggage and I am going to ignore it for the purposes of discussion. The feature is spelled legacy_string
because adding string view accessors is not likely the only thing to do, given we probably want to change the mutators as well. [^2]: The correct size of the horizon is arbitrary, due to the “budget phones in India” problem. Realistically we would need to pick one, start the migration, and halt it if we encounter problems. It is quite difficult to do better than “hope” as our strategy, but packed
is an existence proof that this is not insurmountable, merely very expensive. [^3]: Rust also has feature gates, used mostly so that people may start trying out experimental unstable features. These are largely orthogonal to editions, and tied to compiler versions. Rust‘s feature gates generally do not change the semantics of existing programs, they just cause new programs to be valid. When a feature is “stabilized”, the feature flag is removed. Feature flags do not participate in Rust’s stability promises.