blob: 812ddef22f5dc267e0b959f6ad1ff73a052da29a [file] [log] [blame] [view]
David L. Jonesbd85edf2022-05-31 17:52:26 -07001# How Protobuf supports multiple C++ build systems
2
3This document explains how the Protobuf project supports multiple C++ build
4systems.
5
6## Background
7
8Protobuf primarily uses [Bazel](https://bazel.build) to build the Protobuf C++
9runtime and Protobuf compiler[^historical_sot]. However, there are several
10different build systems in common use for C++, each one of which requires
11essentially a complete copy of the same build definitions.
12
13[^historical_sot]:
14 On a historical note, prior to its [release as Open Source
15 Software](https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html),
16 the Protobuf project was developed using Google's internal build system, which
17 was the predecessor to Bazel (the vast majority of Google's contributions
18 continue to be developed this way). The Open Source Protobuf project, however,
19 historically used Autoconf to build the C++ implementation.
20 Over time, other build systems (including Bazel) have been added, thanks in
21 large part to substantial contributions from the Open Source community. Since
22 the Protobuf project deals with multiple languages (all of which ultimately
23 rely upon C++, for the Protobuf compiler), Bazel is a natural choice for a
24 project-wide build system -- in fact, Bazel (and its predecessor, Blaze)
25 was designed in large part to support exactly this type of rich,
26 multi-language build.
27
28Currently, C++ Protobuf can be built with Bazel, Autotools, and CMake. Each of
29these build systems has different semantics and structure, but share in common
30the list of files needed to build the runtime and compiler.
31
32## Design
33
34### Extracting information from Bazel
35
36Bazel's Starlark API provides [aspects](https://bazel.build/rules/aspects) to
37traverse the build graph, inspect build rules, define additional actions, and
38expose information through
39[providers](https://bazel.build/rules/rules#providers). For example, the
40`cc_proto_library` rule uses an aspect to traverse the dependency graph of
41`proto_library` rules, and dynamically attaches actions to generate C++ code
42using the Protobuf compiler and compile using the C++ compiler.
43
44In order to support multiple build systems, the overall build structure is
45defined once for each system, and expose frequently-changing metadata
46from Bazel in a way that can be included from the build definition. Primarily,
47this means exposing the list of source files in a way that can be included
48in other build definitions.
49
50Two aspects are used to extract this information from the Bazel build
51definitions:
52
53* `cc_file_list_aspect` extracts `srcs`, `hdrs`, and `textual_hdrs` from build
54 rules like `cc_library`. The sources are exposed through a provider named
55 `CcFileList`.
56* `proto_file_list_aspect` extracts the `srcs` from a `proto_library`, and
57 also generates the expected filenames that would be generated by the
58 Protobuf compiler. This information is exposed through a provider named
59 `ProtoFileList`.
60
61On their own, these aspects have limited utility. However, they can be
62instantiated by custom rules, so that an ordinary `BUILD.bazel` target can
63produce outputs based on the information gleaned from these aspects.
64
65### (Aside) Distribution libraries
66
67Bazel's native `cc_library` rule is typically used on a "fine-grained" level, so
68that, for example, lightweight unit tests can be written with narrow scope.
69Although Bazel does build library artifacts (such as `.so` and `.a` files on
70Linux), they correspond to `cc_library` rules.
71
72Since the entire "Protobuf library" includes many constituent `cc_library`
73rules, a special rule, `cc_dist_library`, combines several fine-grained
74libraries into a single, monolithic library.
75
76For the Protobuf project, these "distribution libraries" are intended to match
77the granularity of the Autotools- and CMake-based builds. Since the Bazel-built
78distribution library covers the rules with the source files needed by other
79builds, the `cc_dist_library` rule invokes the `cc_file_list_aspect` on its
80input libraries. The result is that a `cc_dist_library` rule not only produces
81composite library artifacts, but also collect and provide the list of sources
82that were inputs.
83
84For example:
85
86```
87$ cat cc_dist_library_example/BUILD.bazel
88load("@rules_cc//cc:defs.bzl", "cc_library")
89load("//pkg:cc_dist_library.bzl", "cc_dist_library")
90
91cc_library(
92 name = "a",
93 srcs = ["a.cc"],
94)
95
96cc_library(
97 name = "b",
98 srcs = ["b.cc"],
99 deps = [":c"],
100)
101
102# N.B.: not part of the cc_dist_library, even though it is in the deps of 'b':
103cc_library(
104 name = "c",
105 srcs = ["c.cc"],
106)
107
108cc_dist_library(
109 name = "lib",
110 deps = [
111 ":a",
112 ":b",
113 ],
114 visbility = ["//visibility:public"],
115)
116
117# Note: the output below has been formatted for clarity:
118$ bazel cquery //cc_dist_library_example:lib \
119 --output=starlark \
120 --starlark:expr='providers(target)["//pkg:cc_dist_library.bzl%CcFileList"]'
121struct(
122 hdrs = depset([]),
123 internal_hdrs = depset([]),
124 srcs = depset([
125 <source file cc_dist_library_example/a.cc>,
126 <source file cc_dist_library_example/b.cc>,
127 ]),
128 textual_hdrs = depset([]),
129)
130```
131
132The upshot is that the "coarse-grained" library can be defined by the Bazel
133build, and then export the list of source files that are needed to reproduce the
134library in a different build system.
135
136One major difference from most Bazel rule types is that the file list aspects do
137not propagate. In other words, they only expose the immediate dependency's
138sources, not transitive sources. This is for two reasons:
139
1401. Immediate dependencies are conceptually simple, while transitivity requires
141 substantially more thought. For example, if transitive dependencies were
142 considered, then some way would be needed to exclude dependencies that
143 should not be part of the final library (for example, a distribution library
144 for `//:protobuf` could be defined not to include all of
145 `//:protobuf_lite`). While dependency elision is an interesting design
146 problem, the protobuf library is small enough that directly listing
147 dependencies should not be problematic.
1482. Dealing only with immediate dependencies gives finer-grained control over
149 what goes into the composite library. For example, a Starlark `select()`
150 could conditionally add fine-grained libraries to some builds, but not
151 others.
152
153Another subtlety for tests is due to Bazel internals. Internally, a slightly
154different configuration is used when evaluating `cc_test` rules as compared to
155`cc_dist_library`. If `cc_test` targets are included in a `cc_dist_library`
156rule, and both are evaluated by Bazel, this can result in a build-time error:
157the config used for the test contains additional options that tell Bazel how to
158execute the test that the `cc_file_list_aspect` build config does not. Bazel
159detects this as two conflicting actions generating the same outputs. (For
160`cc_test` rules, the simplest workaround is to provide sources through a
161`filegroup` or similar.)
162
163### File list generation
164
165Lists of input files are generated by Bazel in a format that can be imported to
166other build systems. Currently, Automake- and CMake-style files can be
167generated.
168
169The lists of files are derived from Bazel build targets. The sources can be:
170* `cc_dist_library` rules (as described above)
171* `proto_library` rules
172* individual files
173* `filegroup` rules
174* `pkg_files` or `pkg_filegroup` rules from
175 https://github.com/bazelbuild/rules_pkg
176
177For example:
178
179```
180$ cat gen_file_lists_example/BUILD.bazel
181load("@rules_proto//proto:defs.bzl", "proto_library")
182load("//pkg:build_systems.bzl", "gen_cmake_file_lists")
183
184filegroup(
185 name = "doc_files",
186 srcs = [
187 "README.md",
188 "englilsh_paper.md",
189 ],
190)
191
192proto_library(
193 name = "message",
194 srcs = ["message.proto"],
195)
196
197gen_cmake_file_lists(
198 name = "source_lists",
199 out = "source_lists.cmake",
200 src_libs = {
201 ":doc_files": "docs",
202 ":message": "buff",
203 "//cc_dist_library_example:c": "distlib",
204 },
205)
206
207$ bazel build gen_file_lists_example:source_lists
208$ cat bazel-bin/gen_file_lists_example/source_lists.cmake
209# Auto-generated by //gen_file_lists_example:source_lists
210#
211# This file contains lists of sources based on Bazel rules. It should
212# be included from a hand-written CMake file that defines targets.
213#
214# Changes to this file will be overwritten based on Bazel definitions.
215
216if(${CMAKE_VERSION} VERSION_GREATER 3.10 OR ${CMAKE_VERSION} VERSION_EQUAL 3.10)
217 include_guard()
218endif()
219
220# //gen_file_lists_example:doc_files
221set(docs_files
222 gen_file_lists_example/README.md
223 gen_file_lists_example/englilsh_paper.md
224)
225
226# //gen_file_lists_example:message
227set(buff_proto_srcs
228 gen_file_lists_example/message.proto
229)
230
231# //gen_file_lists_example:message
232set(buff_srcs
233 gen_file_lists_example/message.proto.pb.cc
234)
235
236# //gen_file_lists_example:message
237set(buff_hdrs
238 gen_file_lists_example/message.proto.pb.h
239)
240
241# //gen_file_lists_example:message
242set(buff_files
243 gen_file_lists_example/message-descriptor-set.proto.bin
244)
245
246# //cc_dist_library_example:c
247set(distlib_srcs
248 cc_dist_library_example/a.cc
249 cc_dist_library_example/b.cc
250)
251
252# //cc_dist_library_example:c
253set(distlib_hdrs
254
255)
256```
257
258A hand-written CMake build rule could then use the generated file to define
259libraries, such as:
260
261```
262include(source_lists.cmake)
263add_library(distlib ${distlib_srcs} ${buff_srcs})
264```
265
266In addition to `gen_cmake_file_lists`, there is also a `gen_automake_file_lists`
267rule. These rules actually share most of the same implementation, but define
268different file headers and different Starlark "fragment generator" functions
269which format the generated list variables.
270
271### Protobuf usage
272
273The main C++ runtimes (lite and full) and the Protobuf compiler use their
274corresponding `cc_dist_library` rules to generate file lists. For
275`proto_library` targets, the file list generation can extract the source files
276directly. For other targets, notably `cc_test` targets, the file list generators
277use `filegroup` rules.
278
279In general, adding new targets to a non-Bazel build system in Protobuf (or
280adding a new build system altogether) requires some one-time setup:
281
2821. The overall structure of the new build system has to be defined. It should
283 import lists of files and refer to them by variable, instead of listing
284 files directly.
2852. (Only if the build system is new) A new rule type has to be added to
286 `//pkg:build_systems.bzl`. Most of the implementation is shared, but a
287 "fragment generator" is need to declare a file list variable, and the rule
288 type itself has to be defined and call the shared implementation.
289
290When files are added or deleted, or when the Protobuf Bazel structure is
291changed, these changes may need to be reflected in the file list logic. These
292are some example scenarios:
293
294* Files are added to (or removed from) the `srcs` of an existing `cc_library`:
295 no changes needed. If the `cc_library` is already part of a
296 `cc_dist_library`, then regenerating the source lists will reflect the
297 change.
298* A `cc_library` is added: the new target may need to be added to the Protobuf
299 `cc_dist_library` targets, as appropriate.
300* A `cc_library` is deleted: if a `cc_dist_library` depends upon the deleted
301 target, then a build-time error will result. The library needs to be removed
302 from the `cc_dist_library`.
303* A `cc_test` is added or deleted: test sources are handled by `filegroup`
304 rules defined in the same package as the `cc_test` rule. The `filegroup`s
305 are usually given a name like `"test_srcs"`, and often use `glob()` to find
306 sources. This means that adding or removing a test may not require any extra
307 work, but this can be verified within the same package as the test rule.
308* Test-only proto files are added: the `proto_library` might need to be added
309 to the file list map in `//pkg:BUILD.bazel`, and then the file added to
310 various build systems. However, most test-only protos are already exposed
311 through libraries like `//src/google/protobuf:test_protos`.
312
313If there are changes, then the regenerated file lists need to be copied back
314into the repo. That way, the corresponding build systems can be used with a git
315checkout, without needing to run Bazel first.
316
317### (Aside) Distribution archives
318
319A very similar set of rules is defined in `//pkg` to build source distribution
320archives for releases. In addition to the full sources, Protobuf releases also
321include source archives sliced by language, so that, for example, a Ruby-based
322project can get just the sources needed to build the Ruby runtime. (The
323per-language slices also include sources needed to build the protobuf compiler,
324so they all effectively include the C++ runtime.)
325
326These archives are defined using rules from the
327[rules_pkg](https://github.com/bazelbuild/rules_pkg) project. Although they are
328similar to `cc_dist_library` and the file list generation rules, the goals are
329different: the build system file lists described above only apply to C++, and
330are organized according to what should or should not be included in different
331parts of the build (e.g., no tests are included in the main library). On the
332other hand, the distribution archives deal with languages other than C++, and
333contain all the files that need to be distributed as part of a release (even for
334C++, this is more than just the C++ sources).
335
336While it might be possible to use information from the `CcFileList` and
337`ProtoFileList` providers to define the distribution files, additional files
338(such as the various `BUILD.bazel` files) are also needed in the distribution
339archive. The lists of distribution files can usually be generated by `glob()`,
340anyhow, so sharing logic with the file list aspects may not be beneficial.
341
342Currently, all of the file lists are checked in. However, it would be possible
343to build the file lists on-the-fly and include them in the distribution
344archives, rather than checking them in.