blob: ff09575bb8a3af6d79f540ac9c3f4905302f1418 [file] [log] [blame] [view]
David L. Jonesbd85edf2022-05-31 17:52:26 -07001# How Protobuf supports multiple C++ build systems
2
3This document explains how the Protobuf project supports multiple C++ build
4systems.
5
6## Background
7
8Protobuf primarily uses [Bazel](https://bazel.build) to build the Protobuf C++
9runtime and Protobuf compiler[^historical_sot]. However, there are several
10different build systems in common use for C++, each one of which requires
11essentially a complete copy of the same build definitions.
12
13[^historical_sot]:
14 On a historical note, prior to its [release as Open Source
15 Software](https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html),
16 the Protobuf project was developed using Google's internal build system, which
17 was the predecessor to Bazel (the vast majority of Google's contributions
18 continue to be developed this way). The Open Source Protobuf project, however,
19 historically used Autoconf to build the C++ implementation.
20 Over time, other build systems (including Bazel) have been added, thanks in
21 large part to substantial contributions from the Open Source community. Since
22 the Protobuf project deals with multiple languages (all of which ultimately
23 rely upon C++, for the Protobuf compiler), Bazel is a natural choice for a
24 project-wide build system -- in fact, Bazel (and its predecessor, Blaze)
25 was designed in large part to support exactly this type of rich,
26 multi-language build.
27
Mike Kruskaled5c57a2022-08-10 22:51:29 -070028Currently, C++ Protobuf can be built with Bazel and CMake. Each of these build
29systems has different semantics and structure, but share in common the list of
30files needed to build the runtime and compiler.
David L. Jonesbd85edf2022-05-31 17:52:26 -070031
32## Design
33
34### Extracting information from Bazel
35
36Bazel's Starlark API provides [aspects](https://bazel.build/rules/aspects) to
37traverse the build graph, inspect build rules, define additional actions, and
38expose information through
39[providers](https://bazel.build/rules/rules#providers). For example, the
40`cc_proto_library` rule uses an aspect to traverse the dependency graph of
41`proto_library` rules, and dynamically attaches actions to generate C++ code
42using the Protobuf compiler and compile using the C++ compiler.
43
44In order to support multiple build systems, the overall build structure is
45defined once for each system, and expose frequently-changing metadata
46from Bazel in a way that can be included from the build definition. Primarily,
47this means exposing the list of source files in a way that can be included
48in other build definitions.
49
50Two aspects are used to extract this information from the Bazel build
51definitions:
52
53* `cc_file_list_aspect` extracts `srcs`, `hdrs`, and `textual_hdrs` from build
54 rules like `cc_library`. The sources are exposed through a provider named
55 `CcFileList`.
56* `proto_file_list_aspect` extracts the `srcs` from a `proto_library`, and
57 also generates the expected filenames that would be generated by the
58 Protobuf compiler. This information is exposed through a provider named
59 `ProtoFileList`.
60
61On their own, these aspects have limited utility. However, they can be
62instantiated by custom rules, so that an ordinary `BUILD.bazel` target can
63produce outputs based on the information gleaned from these aspects.
64
65### (Aside) Distribution libraries
66
67Bazel's native `cc_library` rule is typically used on a "fine-grained" level, so
68that, for example, lightweight unit tests can be written with narrow scope.
69Although Bazel does build library artifacts (such as `.so` and `.a` files on
70Linux), they correspond to `cc_library` rules.
71
72Since the entire "Protobuf library" includes many constituent `cc_library`
73rules, a special rule, `cc_dist_library`, combines several fine-grained
74libraries into a single, monolithic library.
75
76For the Protobuf project, these "distribution libraries" are intended to match
Mike Kruskaled5c57a2022-08-10 22:51:29 -070077the granularity of the CMake-based builds. Since the Bazel-built
David L. Jonesbd85edf2022-05-31 17:52:26 -070078distribution library covers the rules with the source files needed by other
79builds, the `cc_dist_library` rule invokes the `cc_file_list_aspect` on its
80input libraries. The result is that a `cc_dist_library` rule not only produces
81composite library artifacts, but also collect and provide the list of sources
82that were inputs.
83
84For example:
85
86```
87$ cat cc_dist_library_example/BUILD.bazel
88load("@rules_cc//cc:defs.bzl", "cc_library")
89load("//pkg:cc_dist_library.bzl", "cc_dist_library")
90
91cc_library(
92 name = "a",
93 srcs = ["a.cc"],
94)
95
96cc_library(
97 name = "b",
98 srcs = ["b.cc"],
99 deps = [":c"],
100)
101
102# N.B.: not part of the cc_dist_library, even though it is in the deps of 'b':
103cc_library(
104 name = "c",
105 srcs = ["c.cc"],
106)
107
108cc_dist_library(
109 name = "lib",
110 deps = [
111 ":a",
112 ":b",
113 ],
Mike Kruskal701dd832022-08-20 14:22:08 -0700114 visibility = ["//visibility:public"],
David L. Jonesbd85edf2022-05-31 17:52:26 -0700115)
116
117# Note: the output below has been formatted for clarity:
118$ bazel cquery //cc_dist_library_example:lib \
119 --output=starlark \
120 --starlark:expr='providers(target)["//pkg:cc_dist_library.bzl%CcFileList"]'
121struct(
122 hdrs = depset([]),
123 internal_hdrs = depset([]),
124 srcs = depset([
125 <source file cc_dist_library_example/a.cc>,
126 <source file cc_dist_library_example/b.cc>,
127 ]),
128 textual_hdrs = depset([]),
129)
130```
131
132The upshot is that the "coarse-grained" library can be defined by the Bazel
133build, and then export the list of source files that are needed to reproduce the
134library in a different build system.
135
136One major difference from most Bazel rule types is that the file list aspects do
137not propagate. In other words, they only expose the immediate dependency's
138sources, not transitive sources. This is for two reasons:
139
1401. Immediate dependencies are conceptually simple, while transitivity requires
141 substantially more thought. For example, if transitive dependencies were
142 considered, then some way would be needed to exclude dependencies that
143 should not be part of the final library (for example, a distribution library
144 for `//:protobuf` could be defined not to include all of
145 `//:protobuf_lite`). While dependency elision is an interesting design
146 problem, the protobuf library is small enough that directly listing
147 dependencies should not be problematic.
1482. Dealing only with immediate dependencies gives finer-grained control over
149 what goes into the composite library. For example, a Starlark `select()`
150 could conditionally add fine-grained libraries to some builds, but not
151 others.
152
153Another subtlety for tests is due to Bazel internals. Internally, a slightly
154different configuration is used when evaluating `cc_test` rules as compared to
155`cc_dist_library`. If `cc_test` targets are included in a `cc_dist_library`
156rule, and both are evaluated by Bazel, this can result in a build-time error:
157the config used for the test contains additional options that tell Bazel how to
158execute the test that the `cc_file_list_aspect` build config does not. Bazel
159detects this as two conflicting actions generating the same outputs. (For
160`cc_test` rules, the simplest workaround is to provide sources through a
161`filegroup` or similar.)
162
163### File list generation
164
165Lists of input files are generated by Bazel in a format that can be imported to
Mike Kruskaled5c57a2022-08-10 22:51:29 -0700166other build systems. Currently only CMake-style files can be generated.
David L. Jonesbd85edf2022-05-31 17:52:26 -0700167
168The lists of files are derived from Bazel build targets. The sources can be:
169* `cc_dist_library` rules (as described above)
170* `proto_library` rules
171* individual files
172* `filegroup` rules
173* `pkg_files` or `pkg_filegroup` rules from
174 https://github.com/bazelbuild/rules_pkg
175
176For example:
177
178```
179$ cat gen_file_lists_example/BUILD.bazel
Protobuf Team Bot21d75f82024-04-05 07:45:26 -0700180load("@protobuf//bazel:proto_library.bzl", "proto_library")
David L. Jonesbd85edf2022-05-31 17:52:26 -0700181load("//pkg:build_systems.bzl", "gen_cmake_file_lists")
182
183filegroup(
184 name = "doc_files",
185 srcs = [
186 "README.md",
187 "englilsh_paper.md",
188 ],
189)
190
191proto_library(
192 name = "message",
193 srcs = ["message.proto"],
194)
195
196gen_cmake_file_lists(
197 name = "source_lists",
198 out = "source_lists.cmake",
199 src_libs = {
200 ":doc_files": "docs",
201 ":message": "buff",
202 "//cc_dist_library_example:c": "distlib",
203 },
204)
205
206$ bazel build gen_file_lists_example:source_lists
207$ cat bazel-bin/gen_file_lists_example/source_lists.cmake
208# Auto-generated by //gen_file_lists_example:source_lists
209#
210# This file contains lists of sources based on Bazel rules. It should
211# be included from a hand-written CMake file that defines targets.
212#
213# Changes to this file will be overwritten based on Bazel definitions.
214
215if(${CMAKE_VERSION} VERSION_GREATER 3.10 OR ${CMAKE_VERSION} VERSION_EQUAL 3.10)
216 include_guard()
217endif()
218
219# //gen_file_lists_example:doc_files
220set(docs_files
221 gen_file_lists_example/README.md
222 gen_file_lists_example/englilsh_paper.md
223)
224
225# //gen_file_lists_example:message
226set(buff_proto_srcs
227 gen_file_lists_example/message.proto
228)
229
230# //gen_file_lists_example:message
231set(buff_srcs
232 gen_file_lists_example/message.proto.pb.cc
233)
234
235# //gen_file_lists_example:message
236set(buff_hdrs
237 gen_file_lists_example/message.proto.pb.h
238)
239
240# //gen_file_lists_example:message
241set(buff_files
242 gen_file_lists_example/message-descriptor-set.proto.bin
243)
244
245# //cc_dist_library_example:c
246set(distlib_srcs
247 cc_dist_library_example/a.cc
248 cc_dist_library_example/b.cc
249)
250
251# //cc_dist_library_example:c
252set(distlib_hdrs
253
254)
255```
256
257A hand-written CMake build rule could then use the generated file to define
258libraries, such as:
259
260```
261include(source_lists.cmake)
262add_library(distlib ${distlib_srcs} ${buff_srcs})
263```
264
David L. Jonesbd85edf2022-05-31 17:52:26 -0700265### Protobuf usage
266
267The main C++ runtimes (lite and full) and the Protobuf compiler use their
268corresponding `cc_dist_library` rules to generate file lists. For
269`proto_library` targets, the file list generation can extract the source files
270directly. For other targets, notably `cc_test` targets, the file list generators
271use `filegroup` rules.
272
273In general, adding new targets to a non-Bazel build system in Protobuf (or
274adding a new build system altogether) requires some one-time setup:
275
2761. The overall structure of the new build system has to be defined. It should
277 import lists of files and refer to them by variable, instead of listing
278 files directly.
2792. (Only if the build system is new) A new rule type has to be added to
280 `//pkg:build_systems.bzl`. Most of the implementation is shared, but a
281 "fragment generator" is need to declare a file list variable, and the rule
282 type itself has to be defined and call the shared implementation.
283
284When files are added or deleted, or when the Protobuf Bazel structure is
285changed, these changes may need to be reflected in the file list logic. These
286are some example scenarios:
287
288* Files are added to (or removed from) the `srcs` of an existing `cc_library`:
289 no changes needed. If the `cc_library` is already part of a
290 `cc_dist_library`, then regenerating the source lists will reflect the
291 change.
292* A `cc_library` is added: the new target may need to be added to the Protobuf
293 `cc_dist_library` targets, as appropriate.
294* A `cc_library` is deleted: if a `cc_dist_library` depends upon the deleted
295 target, then a build-time error will result. The library needs to be removed
296 from the `cc_dist_library`.
297* A `cc_test` is added or deleted: test sources are handled by `filegroup`
298 rules defined in the same package as the `cc_test` rule. The `filegroup`s
299 are usually given a name like `"test_srcs"`, and often use `glob()` to find
300 sources. This means that adding or removing a test may not require any extra
301 work, but this can be verified within the same package as the test rule.
302* Test-only proto files are added: the `proto_library` might need to be added
303 to the file list map in `//pkg:BUILD.bazel`, and then the file added to
304 various build systems. However, most test-only protos are already exposed
305 through libraries like `//src/google/protobuf:test_protos`.
306
307If there are changes, then the regenerated file lists need to be copied back
308into the repo. That way, the corresponding build systems can be used with a git
309checkout, without needing to run Bazel first.
310
311### (Aside) Distribution archives
312
313A very similar set of rules is defined in `//pkg` to build source distribution
314archives for releases. In addition to the full sources, Protobuf releases also
315include source archives sliced by language, so that, for example, a Ruby-based
316project can get just the sources needed to build the Ruby runtime. (The
317per-language slices also include sources needed to build the protobuf compiler,
318so they all effectively include the C++ runtime.)
319
320These archives are defined using rules from the
321[rules_pkg](https://github.com/bazelbuild/rules_pkg) project. Although they are
322similar to `cc_dist_library` and the file list generation rules, the goals are
323different: the build system file lists described above only apply to C++, and
324are organized according to what should or should not be included in different
325parts of the build (e.g., no tests are included in the main library). On the
326other hand, the distribution archives deal with languages other than C++, and
327contain all the files that need to be distributed as part of a release (even for
328C++, this is more than just the C++ sources).
329
330While it might be possible to use information from the `CcFileList` and
331`ProtoFileList` providers to define the distribution files, additional files
332(such as the various `BUILD.bazel` files) are also needed in the distribution
333archive. The lists of distribution files can usually be generated by `glob()`,
334anyhow, so sharing logic with the file list aspects may not be beneficial.
335
336Currently, all of the file lists are checked in. However, it would be possible
337to build the file lists on-the-fly and include them in the distribution
338archives, rather than checking them in.