David L. Jones | bd85edf | 2022-05-31 17:52:26 -0700 | [diff] [blame] | 1 | # How Protobuf supports multiple C++ build systems |
| 2 | |
| 3 | This document explains how the Protobuf project supports multiple C++ build |
| 4 | systems. |
| 5 | |
| 6 | ## Background |
| 7 | |
| 8 | Protobuf primarily uses [Bazel](https://bazel.build) to build the Protobuf C++ |
| 9 | runtime and Protobuf compiler[^historical_sot]. However, there are several |
| 10 | different build systems in common use for C++, each one of which requires |
| 11 | essentially a complete copy of the same build definitions. |
| 12 | |
| 13 | [^historical_sot]: |
| 14 | On a historical note, prior to its [release as Open Source |
| 15 | Software](https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html), |
| 16 | the Protobuf project was developed using Google's internal build system, which |
| 17 | was the predecessor to Bazel (the vast majority of Google's contributions |
| 18 | continue to be developed this way). The Open Source Protobuf project, however, |
| 19 | historically used Autoconf to build the C++ implementation. |
| 20 | Over time, other build systems (including Bazel) have been added, thanks in |
| 21 | large part to substantial contributions from the Open Source community. Since |
| 22 | the Protobuf project deals with multiple languages (all of which ultimately |
| 23 | rely upon C++, for the Protobuf compiler), Bazel is a natural choice for a |
| 24 | project-wide build system -- in fact, Bazel (and its predecessor, Blaze) |
| 25 | was designed in large part to support exactly this type of rich, |
| 26 | multi-language build. |
| 27 | |
| 28 | Currently, C++ Protobuf can be built with Bazel, Autotools, and CMake. Each of |
| 29 | these build systems has different semantics and structure, but share in common |
| 30 | the list of files needed to build the runtime and compiler. |
| 31 | |
| 32 | ## Design |
| 33 | |
| 34 | ### Extracting information from Bazel |
| 35 | |
| 36 | Bazel's Starlark API provides [aspects](https://bazel.build/rules/aspects) to |
| 37 | traverse the build graph, inspect build rules, define additional actions, and |
| 38 | expose information through |
| 39 | [providers](https://bazel.build/rules/rules#providers). For example, the |
| 40 | `cc_proto_library` rule uses an aspect to traverse the dependency graph of |
| 41 | `proto_library` rules, and dynamically attaches actions to generate C++ code |
| 42 | using the Protobuf compiler and compile using the C++ compiler. |
| 43 | |
| 44 | In order to support multiple build systems, the overall build structure is |
| 45 | defined once for each system, and expose frequently-changing metadata |
| 46 | from Bazel in a way that can be included from the build definition. Primarily, |
| 47 | this means exposing the list of source files in a way that can be included |
| 48 | in other build definitions. |
| 49 | |
| 50 | Two aspects are used to extract this information from the Bazel build |
| 51 | definitions: |
| 52 | |
| 53 | * `cc_file_list_aspect` extracts `srcs`, `hdrs`, and `textual_hdrs` from build |
| 54 | rules like `cc_library`. The sources are exposed through a provider named |
| 55 | `CcFileList`. |
| 56 | * `proto_file_list_aspect` extracts the `srcs` from a `proto_library`, and |
| 57 | also generates the expected filenames that would be generated by the |
| 58 | Protobuf compiler. This information is exposed through a provider named |
| 59 | `ProtoFileList`. |
| 60 | |
| 61 | On their own, these aspects have limited utility. However, they can be |
| 62 | instantiated by custom rules, so that an ordinary `BUILD.bazel` target can |
| 63 | produce outputs based on the information gleaned from these aspects. |
| 64 | |
| 65 | ### (Aside) Distribution libraries |
| 66 | |
| 67 | Bazel's native `cc_library` rule is typically used on a "fine-grained" level, so |
| 68 | that, for example, lightweight unit tests can be written with narrow scope. |
| 69 | Although Bazel does build library artifacts (such as `.so` and `.a` files on |
| 70 | Linux), they correspond to `cc_library` rules. |
| 71 | |
| 72 | Since the entire "Protobuf library" includes many constituent `cc_library` |
| 73 | rules, a special rule, `cc_dist_library`, combines several fine-grained |
| 74 | libraries into a single, monolithic library. |
| 75 | |
| 76 | For the Protobuf project, these "distribution libraries" are intended to match |
| 77 | the granularity of the Autotools- and CMake-based builds. Since the Bazel-built |
| 78 | distribution library covers the rules with the source files needed by other |
| 79 | builds, the `cc_dist_library` rule invokes the `cc_file_list_aspect` on its |
| 80 | input libraries. The result is that a `cc_dist_library` rule not only produces |
| 81 | composite library artifacts, but also collect and provide the list of sources |
| 82 | that were inputs. |
| 83 | |
| 84 | For example: |
| 85 | |
| 86 | ``` |
| 87 | $ cat cc_dist_library_example/BUILD.bazel |
| 88 | load("@rules_cc//cc:defs.bzl", "cc_library") |
| 89 | load("//pkg:cc_dist_library.bzl", "cc_dist_library") |
| 90 | |
| 91 | cc_library( |
| 92 | name = "a", |
| 93 | srcs = ["a.cc"], |
| 94 | ) |
| 95 | |
| 96 | cc_library( |
| 97 | name = "b", |
| 98 | srcs = ["b.cc"], |
| 99 | deps = [":c"], |
| 100 | ) |
| 101 | |
| 102 | # N.B.: not part of the cc_dist_library, even though it is in the deps of 'b': |
| 103 | cc_library( |
| 104 | name = "c", |
| 105 | srcs = ["c.cc"], |
| 106 | ) |
| 107 | |
| 108 | cc_dist_library( |
| 109 | name = "lib", |
| 110 | deps = [ |
| 111 | ":a", |
| 112 | ":b", |
| 113 | ], |
| 114 | visbility = ["//visibility:public"], |
| 115 | ) |
| 116 | |
| 117 | # Note: the output below has been formatted for clarity: |
| 118 | $ bazel cquery //cc_dist_library_example:lib \ |
| 119 | --output=starlark \ |
| 120 | --starlark:expr='providers(target)["//pkg:cc_dist_library.bzl%CcFileList"]' |
| 121 | struct( |
| 122 | hdrs = depset([]), |
| 123 | internal_hdrs = depset([]), |
| 124 | srcs = depset([ |
| 125 | <source file cc_dist_library_example/a.cc>, |
| 126 | <source file cc_dist_library_example/b.cc>, |
| 127 | ]), |
| 128 | textual_hdrs = depset([]), |
| 129 | ) |
| 130 | ``` |
| 131 | |
| 132 | The upshot is that the "coarse-grained" library can be defined by the Bazel |
| 133 | build, and then export the list of source files that are needed to reproduce the |
| 134 | library in a different build system. |
| 135 | |
| 136 | One major difference from most Bazel rule types is that the file list aspects do |
| 137 | not propagate. In other words, they only expose the immediate dependency's |
| 138 | sources, not transitive sources. This is for two reasons: |
| 139 | |
| 140 | 1. Immediate dependencies are conceptually simple, while transitivity requires |
| 141 | substantially more thought. For example, if transitive dependencies were |
| 142 | considered, then some way would be needed to exclude dependencies that |
| 143 | should not be part of the final library (for example, a distribution library |
| 144 | for `//:protobuf` could be defined not to include all of |
| 145 | `//:protobuf_lite`). While dependency elision is an interesting design |
| 146 | problem, the protobuf library is small enough that directly listing |
| 147 | dependencies should not be problematic. |
| 148 | 2. Dealing only with immediate dependencies gives finer-grained control over |
| 149 | what goes into the composite library. For example, a Starlark `select()` |
| 150 | could conditionally add fine-grained libraries to some builds, but not |
| 151 | others. |
| 152 | |
| 153 | Another subtlety for tests is due to Bazel internals. Internally, a slightly |
| 154 | different configuration is used when evaluating `cc_test` rules as compared to |
| 155 | `cc_dist_library`. If `cc_test` targets are included in a `cc_dist_library` |
| 156 | rule, and both are evaluated by Bazel, this can result in a build-time error: |
| 157 | the config used for the test contains additional options that tell Bazel how to |
| 158 | execute the test that the `cc_file_list_aspect` build config does not. Bazel |
| 159 | detects this as two conflicting actions generating the same outputs. (For |
| 160 | `cc_test` rules, the simplest workaround is to provide sources through a |
| 161 | `filegroup` or similar.) |
| 162 | |
| 163 | ### File list generation |
| 164 | |
| 165 | Lists of input files are generated by Bazel in a format that can be imported to |
| 166 | other build systems. Currently, Automake- and CMake-style files can be |
| 167 | generated. |
| 168 | |
| 169 | The lists of files are derived from Bazel build targets. The sources can be: |
| 170 | * `cc_dist_library` rules (as described above) |
| 171 | * `proto_library` rules |
| 172 | * individual files |
| 173 | * `filegroup` rules |
| 174 | * `pkg_files` or `pkg_filegroup` rules from |
| 175 | https://github.com/bazelbuild/rules_pkg |
| 176 | |
| 177 | For example: |
| 178 | |
| 179 | ``` |
| 180 | $ cat gen_file_lists_example/BUILD.bazel |
| 181 | load("@rules_proto//proto:defs.bzl", "proto_library") |
| 182 | load("//pkg:build_systems.bzl", "gen_cmake_file_lists") |
| 183 | |
| 184 | filegroup( |
| 185 | name = "doc_files", |
| 186 | srcs = [ |
| 187 | "README.md", |
| 188 | "englilsh_paper.md", |
| 189 | ], |
| 190 | ) |
| 191 | |
| 192 | proto_library( |
| 193 | name = "message", |
| 194 | srcs = ["message.proto"], |
| 195 | ) |
| 196 | |
| 197 | gen_cmake_file_lists( |
| 198 | name = "source_lists", |
| 199 | out = "source_lists.cmake", |
| 200 | src_libs = { |
| 201 | ":doc_files": "docs", |
| 202 | ":message": "buff", |
| 203 | "//cc_dist_library_example:c": "distlib", |
| 204 | }, |
| 205 | ) |
| 206 | |
| 207 | $ bazel build gen_file_lists_example:source_lists |
| 208 | $ cat bazel-bin/gen_file_lists_example/source_lists.cmake |
| 209 | # Auto-generated by //gen_file_lists_example:source_lists |
| 210 | # |
| 211 | # This file contains lists of sources based on Bazel rules. It should |
| 212 | # be included from a hand-written CMake file that defines targets. |
| 213 | # |
| 214 | # Changes to this file will be overwritten based on Bazel definitions. |
| 215 | |
| 216 | if(${CMAKE_VERSION} VERSION_GREATER 3.10 OR ${CMAKE_VERSION} VERSION_EQUAL 3.10) |
| 217 | include_guard() |
| 218 | endif() |
| 219 | |
| 220 | # //gen_file_lists_example:doc_files |
| 221 | set(docs_files |
| 222 | gen_file_lists_example/README.md |
| 223 | gen_file_lists_example/englilsh_paper.md |
| 224 | ) |
| 225 | |
| 226 | # //gen_file_lists_example:message |
| 227 | set(buff_proto_srcs |
| 228 | gen_file_lists_example/message.proto |
| 229 | ) |
| 230 | |
| 231 | # //gen_file_lists_example:message |
| 232 | set(buff_srcs |
| 233 | gen_file_lists_example/message.proto.pb.cc |
| 234 | ) |
| 235 | |
| 236 | # //gen_file_lists_example:message |
| 237 | set(buff_hdrs |
| 238 | gen_file_lists_example/message.proto.pb.h |
| 239 | ) |
| 240 | |
| 241 | # //gen_file_lists_example:message |
| 242 | set(buff_files |
| 243 | gen_file_lists_example/message-descriptor-set.proto.bin |
| 244 | ) |
| 245 | |
| 246 | # //cc_dist_library_example:c |
| 247 | set(distlib_srcs |
| 248 | cc_dist_library_example/a.cc |
| 249 | cc_dist_library_example/b.cc |
| 250 | ) |
| 251 | |
| 252 | # //cc_dist_library_example:c |
| 253 | set(distlib_hdrs |
| 254 | |
| 255 | ) |
| 256 | ``` |
| 257 | |
| 258 | A hand-written CMake build rule could then use the generated file to define |
| 259 | libraries, such as: |
| 260 | |
| 261 | ``` |
| 262 | include(source_lists.cmake) |
| 263 | add_library(distlib ${distlib_srcs} ${buff_srcs}) |
| 264 | ``` |
| 265 | |
| 266 | In addition to `gen_cmake_file_lists`, there is also a `gen_automake_file_lists` |
| 267 | rule. These rules actually share most of the same implementation, but define |
| 268 | different file headers and different Starlark "fragment generator" functions |
| 269 | which format the generated list variables. |
| 270 | |
| 271 | ### Protobuf usage |
| 272 | |
| 273 | The main C++ runtimes (lite and full) and the Protobuf compiler use their |
| 274 | corresponding `cc_dist_library` rules to generate file lists. For |
| 275 | `proto_library` targets, the file list generation can extract the source files |
| 276 | directly. For other targets, notably `cc_test` targets, the file list generators |
| 277 | use `filegroup` rules. |
| 278 | |
| 279 | In general, adding new targets to a non-Bazel build system in Protobuf (or |
| 280 | adding a new build system altogether) requires some one-time setup: |
| 281 | |
| 282 | 1. The overall structure of the new build system has to be defined. It should |
| 283 | import lists of files and refer to them by variable, instead of listing |
| 284 | files directly. |
| 285 | 2. (Only if the build system is new) A new rule type has to be added to |
| 286 | `//pkg:build_systems.bzl`. Most of the implementation is shared, but a |
| 287 | "fragment generator" is need to declare a file list variable, and the rule |
| 288 | type itself has to be defined and call the shared implementation. |
| 289 | |
| 290 | When files are added or deleted, or when the Protobuf Bazel structure is |
| 291 | changed, these changes may need to be reflected in the file list logic. These |
| 292 | are some example scenarios: |
| 293 | |
| 294 | * Files are added to (or removed from) the `srcs` of an existing `cc_library`: |
| 295 | no changes needed. If the `cc_library` is already part of a |
| 296 | `cc_dist_library`, then regenerating the source lists will reflect the |
| 297 | change. |
| 298 | * A `cc_library` is added: the new target may need to be added to the Protobuf |
| 299 | `cc_dist_library` targets, as appropriate. |
| 300 | * A `cc_library` is deleted: if a `cc_dist_library` depends upon the deleted |
| 301 | target, then a build-time error will result. The library needs to be removed |
| 302 | from the `cc_dist_library`. |
| 303 | * A `cc_test` is added or deleted: test sources are handled by `filegroup` |
| 304 | rules defined in the same package as the `cc_test` rule. The `filegroup`s |
| 305 | are usually given a name like `"test_srcs"`, and often use `glob()` to find |
| 306 | sources. This means that adding or removing a test may not require any extra |
| 307 | work, but this can be verified within the same package as the test rule. |
| 308 | * Test-only proto files are added: the `proto_library` might need to be added |
| 309 | to the file list map in `//pkg:BUILD.bazel`, and then the file added to |
| 310 | various build systems. However, most test-only protos are already exposed |
| 311 | through libraries like `//src/google/protobuf:test_protos`. |
| 312 | |
| 313 | If there are changes, then the regenerated file lists need to be copied back |
| 314 | into the repo. That way, the corresponding build systems can be used with a git |
| 315 | checkout, without needing to run Bazel first. |
| 316 | |
| 317 | ### (Aside) Distribution archives |
| 318 | |
| 319 | A very similar set of rules is defined in `//pkg` to build source distribution |
| 320 | archives for releases. In addition to the full sources, Protobuf releases also |
| 321 | include source archives sliced by language, so that, for example, a Ruby-based |
| 322 | project can get just the sources needed to build the Ruby runtime. (The |
| 323 | per-language slices also include sources needed to build the protobuf compiler, |
| 324 | so they all effectively include the C++ runtime.) |
| 325 | |
| 326 | These archives are defined using rules from the |
| 327 | [rules_pkg](https://github.com/bazelbuild/rules_pkg) project. Although they are |
| 328 | similar to `cc_dist_library` and the file list generation rules, the goals are |
| 329 | different: the build system file lists described above only apply to C++, and |
| 330 | are organized according to what should or should not be included in different |
| 331 | parts of the build (e.g., no tests are included in the main library). On the |
| 332 | other hand, the distribution archives deal with languages other than C++, and |
| 333 | contain all the files that need to be distributed as part of a release (even for |
| 334 | C++, this is more than just the C++ sources). |
| 335 | |
| 336 | While it might be possible to use information from the `CcFileList` and |
| 337 | `ProtoFileList` providers to define the distribution files, additional files |
| 338 | (such as the various `BUILD.bazel` files) are also needed in the distribution |
| 339 | archive. The lists of distribution files can usually be generated by `glob()`, |
| 340 | anyhow, so sharing logic with the file list aspects may not be beneficial. |
| 341 | |
| 342 | Currently, all of the file lists are checked in. However, it would be possible |
| 343 | to build the file lists on-the-fly and include them in the distribution |
| 344 | archives, rather than checking them in. |