| # What are Protobuf Editions? |
| |
| **Authors**: [@mcy](https://github.com/mcy), [@fowles](https://github.com/fowles) |
| |
| ## Summary |
| |
| This document is an introduction to the Protobuf Editions project, an ambitious |
| re-imagining of how we migrate Protobuf users into the future. |
| |
| ## Goal |
| |
| Enable incremental evolution of Protobuf across the entire ecosystem **without** |
| introducing permanent forks in the Protobuf language. |
| |
| ## TL;DR |
| |
| 1. We are replacing |
| [`syntax`](https://protobuf.dev/reference/protobuf/proto3-spec/#syntax) `= |
| ...` with `edition = ...`. |
| * We plan to produce a new "edition" on a roughly yearly basis. |
| * We plan to regularly deprecate and remove old editions after a wide |
| horizon. |
| * This gradual churn is enabled by the |
| [Protobuf Breaking Changes policy](https://protobuf.dev/news/2022-07-06/#library-breaking-change-policy). |
| 2. "Features" are a special kind of file/message/field/enum/etc option. |
| * Features control the individual codegen and runtime behavior of fields, |
| messages, enums, etc. |
| * Features cannot introduce changes that would directly break existing |
| binaries. |
| * We expect heavy churn of features in `.proto` files, so their design is |
| optimized to minimize diffs to `.proto` files while permitting |
| fine-grained control. |
| * Features are **usually** attached to the field/message/enum they apply |
| to. |
| * Features can be specified at a higher-level entity, such as a file, |
| to apply to all definitions inside of that entity. This is called |
| **feature inheritance**. |
| * Inheritance is intended to allow us to factor frequently-occurring |
| feature declarations, minimizing clutter during migrations. |
| 3. Editions change only the defaults of features and do not otherwise introduce |
| new behavior. |
| * New behavior is fundamentally controlled by features (explicitly set or |
| implicit from an edition). |
| * Editions allow us to ratchet the ecosystem forward. |
| * Editions can be incremented on a per `.proto` file basis; projects |
| can upgrade incrementally. |
| 4. Messages with any permutation of features are always interoperable (they can |
| import each other freely and use messages from each other). |
| * Editions do not split the ecosystem, and migration is largely automated. |
| * Directly inspired by |
| [Rust editions](https://doc.rust-lang.org/edition-guide/editions/index.html). |
| * Carbon has a similar philosophy |
| 5. The `proto2`/`proto3` distinction is going away. |
| * Editions will support everything from both and allow mixed semantics |
| even within the same message or field. |
| * Undesirable features will be LSC'd away, using the same template as any |
| other feature/edition migration. |
| |
| ## Motivation |
| |
| Arguably the biggest hard-earned lesson among Software Foundations is that |
| successful migrations are incremental. Most of our experience with these has |
| been for internal migrations. Externally, progress has often ossified because of |
| a lack of established evolution mechanisms. More recently large projects have |
| started planning incremental evolution into their structure. For example, Carbon |
| is heavily focused on evolution as a core precept, and Rust has built language |
| evolution via editions into its core design.. |
| |
| Protobuf is one of Google's oldest and most successful toolchain projects. |
| However, it was designed before we learned and internalized this lesson, making |
| modernization difficult and haphazard. We still have `required` and `group`, |
| `packed` is not everywhere, and string accessors in C++ still return `const |
| std::string&`. The last radical change to Protobuf (`syntax = "proto3";`) split |
| the ecosystem. |
| |
| *Editions* and *features* are new language features that will allow us to |
| incrementally evolve Protobuf into the future. This will be done by introducing |
| a new `syntax`, hopefully the last syntax addition we will ever need. |
| |
| This high-level document is intended as an introduction to Protobuf Editions for |
| engineers not familiar with the background and the set of tradeoffs that lead us |
| here. Low-level technical details are skipped in preference to describing the |
| kernel of our proposed design. This document reflects the approximate consensus |
| of protobuf-team members who have been developing Protobuf Editions, but please |
| beware: many open questions remain. |
| |
| ## What is a feature? |
| |
| A *feature*, in the narrow context of Protobuf Editions, is an `option` on any |
| syntax entity of a `.proto` file that has the following properties: |
| |
| * It is a field or extension of a top-level option named `features`, which is |
| present on every syntax entity (file, message, enum, field, etc). It can be |
| of any type, but `bool` and `enum` are the most common. |
| * If a syntax entity's lexical parent has a particular value for a feature, |
| then the child has the same value, unless the feature has a new value |
| specified on the child, explicitly. This is called **feature inheritance**, |
| and applies recursively. Features can specify a new value at any of the |
| points where a feature can be added. |
| * It explicitly specifies what syntax entities it can be set on, similar to |
| Java annotations (although this does not preclude inheritance to or through |
| an entity that it *cannot* be set on). |
| |
| Features allow us to control the behavior of `protoc`, its backends, and the |
| Protobuf runtimes at arbitrary granularity. This is critical for large-scale |
| changes: if a message has few usages, features can be changed at a bigger scope, |
| minimizing diff churn, but if it has heavy usage and the CL to migrate a single |
| field is large, cleanups can happen at the field level, as necessary. |
| |
| Features won't change a message’s serialization formats (binary, text, or json) |
| in incompatible ways except for extreme circumstances that will always be |
| managed directly by protobuf-team. It is critical for migrations that any |
| behavioral change coming from a feature is the result of a textual change to a |
| `.proto` file (either an edition bump or a feature change). |
| |
| `ctype` is an existing field option that looks exactly like a feature: it |
| controls the behavior of the codegen backend, although it does not have the nice |
| ratcheting properties of editions. |
| |
| Because features can be extensions, language backends can specify |
| **language-scoped** features. For example, `[ctype = CORD]` could instead be |
| phrased as `[features.(pb.cpp).string_type = CORD]`. Codegen backends own the |
| definitions of their features. |
| |
| ## What is an Edition? |
| |
| An *edition* is a collection of defaults for features understood by `protoc` and |
| its backends. Editions are year-numbered, although we have defined a breakout in |
| case we need multiple editions in a particular year. |
| |
| Instead of writing `syntax = "...";`, a Protobuf Editions-enabled `.proto` file |
| begins with `edition = "2022";` or similar. `edition` implies `syntax = |
| "editions";`, and the `syntax` keyword itself becomes deprecated. This is to |
| ensure that old tools not owned by protobuf-team, which only work for old |
| Protobuf syntaxes, crash or fail quickly and noticeably, instead of wandering |
| into a descriptor that they cannot understand (we will attempt to migrate what |
| we can, of course). |
| |
| `protoc` specifies which editions it understands, and will reject `.proto` files |
| "from the future", since it cannot meaningfully parse them. `protoc` backends, |
| which can specify their own set of language-scoped features, must advertise the |
| defaults for a particular edition that they understand (and reject editions that |
| they don't). Runtimes must be able to handle descriptors "from the future"; this |
| only means that upon encountering a descriptor with an edition or feature it |
| does not understand, there must be a reasonable fallback for the runtime's |
| behavior. |
| |
| ### What is an Edition used for? |
| |
| Editions provide the fundamental increments for the lifecycle of a feature. At |
| this point it is important to reiterate that most features will be specific to |
| particular code generators. What follows is an example life cycle for a |
| theoretical feature–`features.(pb.cpp).opaque_repeated_fields`. |
| |
| 1. Edition “2025” creates `features.(pb.cpp).opaque_repeated_fields` with a |
| default value of `false`. This value is equivalent to the behavior from |
| editions less than “2025”. |
| |
| a. The migration to edition “2025” across google will move very fast as it |
| is a no-op. |
| |
| 2. Migration begins for `features.(pb.cpp).opaque_repeated_fields` (each change |
| in this migration will add `features.(pb.cpp).opaque_repeated_fields = true` |
| and be paired with code changes required to C++ code). It is not anticipated |
| that protos shared between repos will undergo field by field migrations like |
| this as that would cause a large stream of breaking changes, see |
| [Protobuf Editions for schema producers](protobuf-editions-for-schema-producers.md) |
| for more details. |
| |
| 3. Edition “2027” switches the default of |
| `features.(pb.cpp).opaque_repeated_fields` to `true`. |
| |
| a. The migration to “2027” will remove explicit uses of |
| `features.(pb.cpp).opaque_repeated_fields = true` and add explicit uses of |
| `features.(pb.cpp).opaque_repeated_fields = false` where they were implicit |
| before. As above, this migration will be a no-op, so it will move very fast. |
| |
| b. Externally, we will release tools and migration guides for OSS customers. |
| The tools will not be fully turnkey, but should provide a strong starting |
| point for user migrations. |
| |
| 4. Migration continues for `features.(pb.cpp).opaque_repeated_fields` (each |
| change in this migration will remove |
| `features.(pb.cpp).opaque_repeated_fields = false` and be paired with code |
| changes required to C++ code). |
| |
| 5. At some point, usage will be officially roped off internally, and |
| externally. |
| |
| a. Internally, `features.(pb.cpp).opaque_repeated_fields` usage will be |
| blocked with allowlists while we remove the hardest to migrate case. |
| |
| b. Externally, `features.(pb.cpp).opaque_repeated_fields` will be marked |
| deprecated in a public edition and removed in a later one. When a feature is |
| removed, the code generators for that behavior and the runtime libraries |
| that support it may also be removed. In this hypothetical, that might be |
| deprecated in “2029” and removed in “2031”. Any release that removes support |
| for a feature would be a major version bump. |
| |
| The key point to note here is that any `.proto` file that does not use |
| deprecated features has a no-op upgrade from one edition to the next and we will |
| provide tools to effect that upgrade. Internal users will be migrated centrally |
| before a feature is deprecated. External users will have the full window of the |
| Google migration as well as the deprecation window to upgrade their own code. |
| |
| It is also important to note that external users will not receive compiler |
| warnings until the feature is actually deprecated, so we provide a period of |
| deprecation to ensure that they have time to update their code before forcing |
| them to upgrade for an edition update. |
| |
| Separately from feature evolution, `protoc` itself may remove support for old |
| editions entirely after a suitably long window (like 10 years). |
| |
| ## Edition Zero |
| |
| The first edition of Protobuf Editions, the so-called "edition zero", will |
| effectively be a "`proto4`" that introduces the new syntax, and merges the |
| semantics of `proto2` and `proto3`. In editions mode, everything that was |
| possible in `proto2` and `proto3` will be possible, and the handful of |
| irreconcilable differences will be expressed as features. |
| |
| For example, whether values not specified in an `enum` go into unknown fields vs |
| producing an enum value outside of the bounds of the specified values in the |
| `.proto` file (i.e., so-called closed and open enums) will be controlled by |
| `feature.enum = OPEN` or `feature.enum = CLOSED`. |
| |
| Edition Zero should be viewed as the "completion" of the union of `proto2` and |
| `proto3`: it contains both syntaxes as subsets (although with different |
| spellings to disambiguate things) as well as new behavior that was previously |
| inexpressible but which is an obvious consequence of allowing everything from |
| both. For example, `proto3`-style non-optional singular fields could allow |
| non-zero defaults. |
| |
| Edition Zero is designed in such a way that we can mechanically migrate an |
| arbitrary `.proto` file from either `proto2` or `proto3` with no behavioral |
| changes, by replacing `syntax` with `edition` and adding features in the |
| appropriate locations. |
| |
| This will form the foundation of Protobuf Editions and the torrent of parallel |
| migrations that will follow. |
| |
| ## FAQ |
| |
| ### I only interact with protos by moving them around and editing schemata. How does this affect me? |
| |
| This will manifest as a handful of new `option`s appearing at the top of your |
| files. Going forward, expect new `options` to appear and disappear from your |
| `.proto` files as LSCs march across the codebase. We intend to minimize |
| disruption, and you should be able to safely ignore them. |
| |
| In general, you should not need to add `option`s yourself unless we say so in |
| documentation. We will try to make sure tooling recommends the latest edition |
| when creating new files. |
| |
| ### Are you taking away <thing>? |
| |
| Everything expressible today will remain so in Edition Zero. Some syntax will |
| change: we will have only one way of spelling a singular field (with `optional` |
| vs. the `proto3` behavior vs. `required` controlled by a feature), `group`s will |
| turn into sub message fields with a special encoding. |
| |
| ### I think <thing> from proto{2,3} is bad. Why are you letting people use it in my files? |
| |
| Long-term bifurcation of the language has resulted in significant damage to the. |
| ecosystem and engineers' mental model of Protobuf. There are features we think |
| are questionable, too, and we want to remove them. But we need to break some |
| eggs to make an omelet. |
| |
| As stewards of the Protobuf language, we believe this is the best way to get rid |
| of features that were a good idea at the time, but which history has shown to |
| have had poor outcomes. |
| |
| ### I manipulate protos reflectively, or have some other complicated use-case |
| |
| We plan to upgrade reflection to be feature-aware in a way that minimizes code |
| we need to change. We do not expect anyone to implement feature-inheritance |
| logic themselves; feature inheritance should be fully transparent to users, |
| behaving as if features had been placed explicitly everywhere. (Owners of code |
| generators should be the only ones that need to know how to correctly propagate |
| features.) |
| |
| We will be partnering with use-cases that are known risks for migration, such as |
| storage providers, to minimize toil and disruption on all sides. |
| |
| ### I want to use features to fix a defect in Protobuf |
| |
| Generally, the owner of the relevant component that ingests a particular feature |
| (`protoc` or the appropriate language backend) will own it. We will try to make |
| it as straightforward as we can to add a language-scoped feature, but it may |
| require some degree of coordination with us to get it into an edition. |
| |
| Even if it's about one of protobuf-team's backends, we'd love to hear what you |
| think we can fix, within the constraints of editions. |
| |
| ### What's your OSS strategy? |
| |
| We want to share a variant of this document with the OSS community. We plan to |
| publish migration guides and, where feasible, any migration tooling, such as the |
| `proto2`/`proto3` -> `edition` migrator. |
| |
| As stated above, we want to minimize friction for non-protobuf-team-owned |
| backends, and this ties into helping third party code generators minimize their |
| pain. |
| |
| ### I like Protobuf as it is. Can I keep my old files? |
| |
| Yes, but you get to keep both pieces. Failing to migrate off of old use-cases |
| and into newer versions that fix known defects is a risk for the entire |
| ecosystem: C++'s disastrous standardization process is a solemn warning of |
| failing to do so. |
| |
| Trying to stay on `proto2` or `proto3` will eventually cease to be supported, |
| and old editions (e.g. 5 years) will also cease to be supported. Evolution is at |
| the heart of Protobuf, and we want to make it as easy as possible for users to |
| keep up with our progress towards a better Protobuf. |
| |
| ### What do you hope to use editions to change in the short/mid term? |
| |
| An incomplete list of *ideas*, which should be taken as non-committal. |
| |
| * Eliminate `required` completely by making a particular field be optional but |
| serialized unconditionally. |
| * Make all uses of `string` require UTF-8 checking, and all uses that don't |
| want/need it `bytes`, fulfilling the original `proto3` vision. |
| * Make every `string` and `bytes` accessor in C++ return `absl::string_view`, |
| unlocking performance optimizations. |
| * Make all scalar `repeated` fields `packed`, improving throughput. |
| * Make `enum` enumerators in C++ use `kName` instead of `NAME`. |
| * Make `enum` declarations in C++ into scoped `enum class`. |
| * Make `CTYPE` into a language-scoped feature. |
| * Replace per-language, file-level options with language-scoped features. |
| * Make reflection opt-in for some languages (C++). |