This document explains how the Protobuf project supports multiple C++ build systems.
Protobuf primarily uses Bazel to build the Protobuf C++ runtime and Protobuf compiler[^historical_sot]. However, there are several different build systems in common use for C++, each one of which requires essentially a complete copy of the same build definitions.
[^historical_sot]: On a historical note, prior to its release as Open Source Software, the Protobuf project was developed using Google‘s internal build system, which was the predecessor to Bazel (the vast majority of Google’s contributions continue to be developed this way). The Open Source Protobuf project, however, historically used Autoconf to build the C++ implementation. Over time, other build systems (including Bazel) have been added, thanks in large part to substantial contributions from the Open Source community. Since the Protobuf project deals with multiple languages (all of which ultimately rely upon C++, for the Protobuf compiler), Bazel is a natural choice for a project-wide build system -- in fact, Bazel (and its predecessor, Blaze) was designed in large part to support exactly this type of rich, multi-language build.
Currently, C++ Protobuf can be built with Bazel and CMake. Each of these build systems has different semantics and structure, but share in common the list of files needed to build the runtime and compiler.
Bazel's Starlark API provides aspects to traverse the build graph, inspect build rules, define additional actions, and expose information through providers. For example, the cc_proto_library
rule uses an aspect to traverse the dependency graph of proto_library
rules, and dynamically attaches actions to generate C++ code using the Protobuf compiler and compile using the C++ compiler.
In order to support multiple build systems, the overall build structure is defined once for each system, and expose frequently-changing metadata from Bazel in a way that can be included from the build definition. Primarily, this means exposing the list of source files in a way that can be included in other build definitions.
Two aspects are used to extract this information from the Bazel build definitions:
cc_file_list_aspect
extracts srcs
, hdrs
, and textual_hdrs
from build rules like cc_library
. The sources are exposed through a provider named CcFileList
.proto_file_list_aspect
extracts the srcs
from a proto_library
, and also generates the expected filenames that would be generated by the Protobuf compiler. This information is exposed through a provider named ProtoFileList
.On their own, these aspects have limited utility. However, they can be instantiated by custom rules, so that an ordinary BUILD.bazel
target can produce outputs based on the information gleaned from these aspects.
Bazel's native cc_library
rule is typically used on a “fine-grained” level, so that, for example, lightweight unit tests can be written with narrow scope. Although Bazel does build library artifacts (such as .so
and .a
files on Linux), they correspond to cc_library
rules.
Since the entire “Protobuf library” includes many constituent cc_library
rules, a special rule, cc_dist_library
, combines several fine-grained libraries into a single, monolithic library.
For the Protobuf project, these “distribution libraries” are intended to match the granularity of the CMake-based builds. Since the Bazel-built distribution library covers the rules with the source files needed by other builds, the cc_dist_library
rule invokes the cc_file_list_aspect
on its input libraries. The result is that a cc_dist_library
rule not only produces composite library artifacts, but also collect and provide the list of sources that were inputs.
For example:
$ cat cc_dist_library_example/BUILD.bazel load("@rules_cc//cc:defs.bzl", "cc_library") load("//pkg:cc_dist_library.bzl", "cc_dist_library") cc_library( name = "a", srcs = ["a.cc"], ) cc_library( name = "b", srcs = ["b.cc"], deps = [":c"], ) # N.B.: not part of the cc_dist_library, even though it is in the deps of 'b': cc_library( name = "c", srcs = ["c.cc"], ) cc_dist_library( name = "lib", deps = [ ":a", ":b", ], visibility = ["//visibility:public"], ) # Note: the output below has been formatted for clarity: $ bazel cquery //cc_dist_library_example:lib \ --output=starlark \ --starlark:expr='providers(target)["//pkg:cc_dist_library.bzl%CcFileList"]' struct( hdrs = depset([]), internal_hdrs = depset([]), srcs = depset([ <source file cc_dist_library_example/a.cc>, <source file cc_dist_library_example/b.cc>, ]), textual_hdrs = depset([]), )
The upshot is that the “coarse-grained” library can be defined by the Bazel build, and then export the list of source files that are needed to reproduce the library in a different build system.
One major difference from most Bazel rule types is that the file list aspects do not propagate. In other words, they only expose the immediate dependency's sources, not transitive sources. This is for two reasons:
//:protobuf
could be defined not to include all of //:protobuf_lite
). While dependency elision is an interesting design problem, the protobuf library is small enough that directly listing dependencies should not be problematic.select()
could conditionally add fine-grained libraries to some builds, but not others.Another subtlety for tests is due to Bazel internals. Internally, a slightly different configuration is used when evaluating cc_test
rules as compared to cc_dist_library
. If cc_test
targets are included in a cc_dist_library
rule, and both are evaluated by Bazel, this can result in a build-time error: the config used for the test contains additional options that tell Bazel how to execute the test that the cc_file_list_aspect
build config does not. Bazel detects this as two conflicting actions generating the same outputs. (For cc_test
rules, the simplest workaround is to provide sources through a filegroup
or similar.)
Lists of input files are generated by Bazel in a format that can be imported to other build systems. Currently only CMake-style files can be generated.
The lists of files are derived from Bazel build targets. The sources can be:
cc_dist_library
rules (as described above)proto_library
rulesfilegroup
rulespkg_files
or pkg_filegroup
rules from https://github.com/bazelbuild/rules_pkgFor example:
$ cat gen_file_lists_example/BUILD.bazel load("@protobuf//bazel:proto_library.bzl", "proto_library") load("//pkg:build_systems.bzl", "gen_cmake_file_lists") filegroup( name = "doc_files", srcs = [ "README.md", "englilsh_paper.md", ], ) proto_library( name = "message", srcs = ["message.proto"], ) gen_cmake_file_lists( name = "source_lists", out = "source_lists.cmake", src_libs = { ":doc_files": "docs", ":message": "buff", "//cc_dist_library_example:c": "distlib", }, ) $ bazel build gen_file_lists_example:source_lists $ cat bazel-bin/gen_file_lists_example/source_lists.cmake # Auto-generated by //gen_file_lists_example:source_lists # # This file contains lists of sources based on Bazel rules. It should # be included from a hand-written CMake file that defines targets. # # Changes to this file will be overwritten based on Bazel definitions. if(${CMAKE_VERSION} VERSION_GREATER 3.10 OR ${CMAKE_VERSION} VERSION_EQUAL 3.10) include_guard() endif() # //gen_file_lists_example:doc_files set(docs_files gen_file_lists_example/README.md gen_file_lists_example/englilsh_paper.md ) # //gen_file_lists_example:message set(buff_proto_srcs gen_file_lists_example/message.proto ) # //gen_file_lists_example:message set(buff_srcs gen_file_lists_example/message.proto.pb.cc ) # //gen_file_lists_example:message set(buff_hdrs gen_file_lists_example/message.proto.pb.h ) # //gen_file_lists_example:message set(buff_files gen_file_lists_example/message-descriptor-set.proto.bin ) # //cc_dist_library_example:c set(distlib_srcs cc_dist_library_example/a.cc cc_dist_library_example/b.cc ) # //cc_dist_library_example:c set(distlib_hdrs )
A hand-written CMake build rule could then use the generated file to define libraries, such as:
include(source_lists.cmake) add_library(distlib ${distlib_srcs} ${buff_srcs})
The main C++ runtimes (lite and full) and the Protobuf compiler use their corresponding cc_dist_library
rules to generate file lists. For proto_library
targets, the file list generation can extract the source files directly. For other targets, notably cc_test
targets, the file list generators use filegroup
rules.
In general, adding new targets to a non-Bazel build system in Protobuf (or adding a new build system altogether) requires some one-time setup:
//pkg:build_systems.bzl
. Most of the implementation is shared, but a “fragment generator” is need to declare a file list variable, and the rule type itself has to be defined and call the shared implementation.When files are added or deleted, or when the Protobuf Bazel structure is changed, these changes may need to be reflected in the file list logic. These are some example scenarios:
srcs
of an existing cc_library
: no changes needed. If the cc_library
is already part of a cc_dist_library
, then regenerating the source lists will reflect the change.cc_library
is added: the new target may need to be added to the Protobuf cc_dist_library
targets, as appropriate.cc_library
is deleted: if a cc_dist_library
depends upon the deleted target, then a build-time error will result. The library needs to be removed from the cc_dist_library
.cc_test
is added or deleted: test sources are handled by filegroup
rules defined in the same package as the cc_test
rule. The filegroup
s are usually given a name like "test_srcs"
, and often use glob()
to find sources. This means that adding or removing a test may not require any extra work, but this can be verified within the same package as the test rule.proto_library
might need to be added to the file list map in //pkg:BUILD.bazel
, and then the file added to various build systems. However, most test-only protos are already exposed through libraries like //src/google/protobuf:test_protos
.If there are changes, then the regenerated file lists need to be copied back into the repo. That way, the corresponding build systems can be used with a git checkout, without needing to run Bazel first.
A very similar set of rules is defined in //pkg
to build source distribution archives for releases. In addition to the full sources, Protobuf releases also include source archives sliced by language, so that, for example, a Ruby-based project can get just the sources needed to build the Ruby runtime. (The per-language slices also include sources needed to build the protobuf compiler, so they all effectively include the C++ runtime.)
These archives are defined using rules from the rules_pkg project. Although they are similar to cc_dist_library
and the file list generation rules, the goals are different: the build system file lists described above only apply to C++, and are organized according to what should or should not be included in different parts of the build (e.g., no tests are included in the main library). On the other hand, the distribution archives deal with languages other than C++, and contain all the files that need to be distributed as part of a release (even for C++, this is more than just the C++ sources).
While it might be possible to use information from the CcFileList
and ProtoFileList
providers to define the distribution files, additional files (such as the various BUILD.bazel
files) are also needed in the distribution archive. The lists of distribution files can usually be generated by glob()
, anyhow, so sharing logic with the file list aspects may not be beneficial.
Currently, all of the file lists are checked in. However, it would be possible to build the file lists on-the-fly and include them in the distribution archives, rather than checking them in.