| Architecture of Gazelle |
| ======================= |
| |
| .. All external links are here. |
| |
| .. Godoc links |
| .. _buildifier build: https://godoc.org/github.com/bazelbuild/buildtools/build |
| .. _config: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/config |
| .. _go/build: https://godoc.org/go/build |
| .. _go/parser: https://godoc.org/go/parser |
| .. _merger: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/merger |
| .. _packages: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/packages |
| .. _resolve: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/resolve |
| .. _rules: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/rules |
| .. _CallExpr: https://godoc.org/github.com/bazelbuild/buildtools/build#CallExpr |
| .. _golang.org/x/tools/go/vcs: https://godoc.org/golang.org/x/tools/go/vcs |
| |
| .. Other documentation links |
| .. _buildifier: https://github.com/bazelbuild/buildtools/tree/master/buildifier |
| .. _config_setting: https://docs.bazel.build/versions/master/be/general.html#config_setting |
| .. _Fix command transformations: README.rst#fix-command-transformations |
| .. _full list of directives: README.rst#Directives |
| .. _select: https://docs.bazel.build/versions/master/skylark/lib/globals.html#select |
| |
| .. Issues |
| .. _#5: https://github.com/bazelbuild/bazel-gazelle/issues/5 |
| .. _#7: https://github.com/bazelbuild/bazel-gazelle/issues/7 |
| |
| .. Actual content is below |
| |
| Gazelle is a tool that generates and updates Bazel build files for Go projects |
| that follow the conventional "go build" project layout. It is intended to |
| simplify the maintenance of Bazel Go projects as much as possible. |
| |
| This document describes how Gazelle works. It should help users understand why |
| Gazelle behaves as it does, and it should help developers understand |
| how to modify Gazelle and how to write similar tools. |
| |
| .. contents:: |
| |
| Overview |
| -------- |
| |
| Gazelle generates and updates build files according the algorithm outlined |
| below. Each of the steps here is described in more detail in the sections below. |
| |
| * Build a configuration from command line arguments and special comments |
| in the top-level build file. See Configuration_. |
| |
| * For each directory in the repository: |
| |
| * Read the build file if one is present. |
| |
| * If the build file should be updated (based on configuration): |
| |
| * Apply transformations to the build file to migrate away from deprecated |
| APIs. See `Fixing build files`_. |
| |
| * Scan the source files and collect metadata needed to generate rules |
| for the directory. See `Scanning source files`_. |
| |
| * Generate new rules from the build metadata collected earlier. See |
| `Generating rules`_. |
| |
| * Merge the new rules into the directory's build file. Delete any rules |
| which are now empty. See `Merging and deleting rules`_. |
| |
| * Add the library rules in the directory's build file to a global table, |
| indexed by import path. |
| |
| * For each updated build file: |
| |
| * Use the library table to map import paths to Bazel labels for rules that |
| were added or merged earlier. See `Resolving dependencies`_. |
| |
| * Merge the resolved rules back into the file. |
| |
| * Format the file using buildifier_ and emit it according to the output mode: |
| write to disk, print the whole file, or print the diff. |
| |
| Configuration |
| ------------- |
| |
| Godoc: config_ |
| |
| Gazelle stores configuration information in ``Config`` objects. These objects |
| contain settings that affect the behavior of most packages in the program. |
| For example: |
| |
| * The list of directories that Gazelle should update. |
| * The path of the repository root directory. Bazel package names are based |
| on paths relative to this location. |
| * The current import path prefix and the directory where it was set. |
| Gazelle uses this to infer import paths for ``go_library`` rules. |
| * A list of build tags that Gazelle considers to be true on all platforms. |
| |
| ``Config`` objects apply to individual directories. Each directory inherits |
| the ``Config`` from its parent. Values in a ``Config`` may be modified within |
| a directory using *directives* written in the directory's build file. A |
| directive is a special comment formatted like this: |
| |
| :: |
| |
| # gazelle:key value |
| |
| Here are a few examples. See the `full list of directives`_. |
| |
| * ``# gazelle:prefix`` - sets the Go import path prefix for the current |
| directory. |
| * ``# gazelle:build_tags`` - sets the list of build tags which Gazelle considers |
| to be true on all platforms. |
| |
| There are a few directives which are not applied to the ``Config`` object but |
| are interpreted directly in packages where they are relevant. |
| |
| * ``# gazelle:ignore`` - the build file should not be updated by Gazelle. |
| Gazelle may still index its contents so it can resolve dependencies in other |
| build files. |
| * ``# gazelle:exclude path/to/file`` - the named file should not be read by |
| Gazelle and should not be included in ``srcs`` lists. If this refers to |
| a directory, Gazelle won't recurse into the directory. This directive may |
| appear multiple times. |
| |
| Fixing build files |
| ------------------ |
| |
| Godoc: merger_ |
| |
| From time to time, APIs in rules_go are changed or updated. Gazelle helps |
| users stay up to date with these changes by automatically fixing deprecated |
| usage. |
| |
| Minor fixes are applied by Gazelle automatically every time it runs. However, |
| some fixes may delete or rename existing rules. Users must run ``gazelle fix`` |
| to apply these fixes. By default, Gazelle will only *warn* users that |
| ``gazelle fix`` should be run. |
| |
| Here are a few of the fixes Gazelle performs. See `Fix command transformations`_ |
| for a full list. |
| |
| * **Squash cgo libraries:** Gazelle will remove ``cgo_library`` rules and |
| merge their attributes into ``go_library`` rules that reference them. |
| This is a major fix and is only applied with ``gazelle fix``. |
| * **Migrate library attributes:** Gazelle replaces ``library`` attributes |
| with ``embed`` attributes. The only difference between these is that |
| ``library`` (which is now deprecated) accepts a single label, while ``embed`` |
| accepts a list. This is a minor fix and is always applied. |
| |
| Users can prevent Gazelle from modifying rules, attributes, or individual |
| values by writing ``# keep`` comments above them. |
| |
| Scanning source files |
| --------------------- |
| |
| Godoc: packages_ |
| |
| Nearly all of the information needed to build a program with the standard Go SDK |
| is implied by directory structure, file names, and file contents. This is why |
| ``go build`` doesn't require any sort of build file. The `go/build`_ package in |
| the standard library collects this information. |
| |
| Unfortunately, `go/build`_ can only collect information for one platform at |
| a time. Gazelle needs to generate build files that work on all platforms, so |
| we have our own implementation of this logic. |
| |
| Information extracted from files |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Gazelle extracts build metadata from source files and contents in much the |
| same way that the standard `go/build`_ package does. It gets the following |
| information from file names: |
| |
| * File extension (e.g., .go, .c, .proto). Normally, only .go, .s, and .h files |
| are included in Go rules. If any cgo code is present, then C/C++ files are |
| also included. .proto files are also used to build proto rules. Other files |
| (e.g., .txt) are ignored. |
| * Test suffix. For example, if a file is named ``foo_test.go``, it will be |
| included in a test target instead of a library or binary target. |
| * OS and architecture suffixes. For example, a file named ``foo_linux_amd64.go`` |
| will be listed in the ``linux_amd64`` section of the target it belongs to. |
| |
| Gazelle gets the following information from file contents: |
| |
| * Package name. This is syntactically the first part of every .go file. All |
| files in the same directory must have the same package name (except for |
| external test sources, which have a package name ending with ``_test``). If |
| there are multiple packages, Gazelle will choose one that matches the |
| directory name (if present) or report an error. |
| * Imported libraries. Go import paths are usually URLs. Imports in |
| platform-specific source files are also platform-specific. |
| * Build tags. The Go toolchain recognizes comments beginning with ``// +build`` |
| before the package declaration. These tags tell the build system that a file |
| should only be built for specific platforms. See `this article |
| <https://dave.cheney.net/2013/10/12/how-to-use-conditional-compilation-with-the-go-build-tool>`_ |
| for more information. |
| * Whether cgo code is present. This affects how packages are built and |
| whether C/C++ files are included. |
| * C/C++ compile and link options (specified in ``#cgo`` directives in cgo |
| comments). These may be platform-specific. |
| |
| In most cases, only the top of the file is parsed. For Go files, we use the |
| standard `go/parser`_ package. For proto files, we use regular expressions that |
| match ``package``, ``go_package``, and ``import`` statements. |
| |
| The ``Package`` object |
| ~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Gazelle stores build metadata in a ``Package`` object. Currently, we only |
| support one ``Package`` per directory (which is also what the Go SDK supports), |
| but this will be expanded in the future. ``Package`` objects contain some |
| top-level metadata (like the package name and directory path), along with |
| several target objects (``GoTarget`` and ``ProtoTarget``). |
| |
| Target objects correspond directly to rules that will be generated later. They |
| store lists of sources, imports, and flags in ``PlatformStrings`` objects. |
| |
| ``PlatformStrings`` objects store strings in four sections: a generic list, an |
| OS-specific dictionary, an architecture-specific dictionary, and an |
| OS-and-architecture-specific dictionary. The keys in the dictionaries are OS |
| names, architecture names, or OS-and-architecture pairs; the values are lists of |
| strings. The same string may not appear more than once in a list and may not |
| appear in more than one section. This is due to a Bazel requirement: the same |
| label may not appear more than once in a ``deps`` list. |
| |
| Generating rules |
| ---------------- |
| |
| Godoc: rules_ |
| |
| Once build metadata has been extracted from the sources in a directory, |
| Gazelle generates rules for building those sources. |
| |
| Generated rules are formatted as CallExpr_ objects. CallExpr_ is defined in the |
| `buildifier build`_ library. This is the same library used to parse and format |
| build files. This lets us manipulate newly generated rules and existing rules |
| with the same code. |
| |
| We may generate the following rules: |
| |
| * ``proto_library`` and ``go_proto_library`` are generated if there was at |
| least one .proto source file. |
| * ``go_library`` is generated if there was at least one non-test source. This |
| may embed the ``go_proto_library`` if there was one. |
| * ``go_test`` rules are generated for internal and external tests. Internal |
| tests embed the ``go_library`` while external tests depend on the |
| ``go_library`` as a separate package. |
| * ``go_binary`` is generated if the package name was ``main``. It embeds the |
| ``go_library``. |
| |
| Rules are named according to a pluggable naming policy, but there is currently |
| only one policy: libraries are named ``go_default_library``, tests are |
| named ``go_default_test``, and binaries are named after the directory. The |
| ``go_default_library`` name is an historical artifact from before we had |
| index-based dependency resolution. We'll need to move away from this naming |
| scheme in the future (`#5`_) before we support multiple packages (`#7`_). |
| |
| Sources, imports, and flags within each target are converted to expressions in a |
| straightforward fashion. The lists within ``PlatformStrings`` are converted to |
| list expressions. Dictionaries are converted to calls to `select`_ expressions |
| (when Bazel evaluates a `select`_ expression, it will choose one of several |
| provided lists, based on `config_setting`_ rules). Lists and select expressions |
| may be added together. For example: |
| |
| .. code:: bzl |
| |
| go_library( |
| name = "go_default_library", |
| srcs = [ |
| "terminal.go", |
| ] + select({ |
| "@io_bazel_rules_go//go/platform:darwin": [ |
| "util.go", |
| "util_bsd.go", |
| ], |
| "@io_bazel_rules_go//go/platform:linux": [ |
| "util.go", |
| "util_linux.go", |
| ], |
| "@io_bazel_rules_go//go/platform:windows": [ |
| "util_windows.go", |
| ], |
| "//conditions:default": [], |
| }), |
| ... |
| ) |
| |
| At this point, Gazelle does not have enough information to generate expressions |
| ``deps`` attributes. We only have a list of import strings extracted from source |
| files. These imports are stored temporarily in a special ``_gazelle_imports`` |
| attribute in each rule. Later, the imports are converted to Bazel labels (see |
| `Resolving dependencies`_), and this attribute is replaced with ``deps``. |
| |
| Merging and deleting rules |
| -------------------------- |
| |
| Godoc: merger_ |
| |
| Merging is the process of combining generated rules with the corresponding |
| rules in an existing build file. If no build file exists in a directory, a |
| new file is created with generated rules, and no merging is performed. |
| |
| Merging occurs in two phases: pre-resolve, and post-resolve. This is due to an |
| interdependence with dependency resolution. Dependency resolution uses a table |
| of *merged* library rules, so it can't be performed until the pre-resolve merge |
| has occurred. After dependency resolution, we need to merge newly generated |
| ``deps`` attributes; this is done in the post-resolve merge. The two phases use |
| the same algorithm. |
| |
| During the merge process, Gazelle attempts to match generated rules with |
| existing rules that have the same name and same kind. Rules are only merged if |
| both name and kind match. If an existing rule has the same name as a generated |
| rule but a different kind, the generated rule will not be merged. If no |
| existing rule matches a generated rule, the generated rule is simply appended to |
| the end of the file. Existing rules that don't match any generated rule are not |
| modified. |
| |
| When Gazelle identifies a matching pair of rules, it combines each attribute |
| according to the algorithm below. If an attribute is present in the generated |
| rule but not in the existing rule, it is copied to the merged rule verbatim. If |
| an attribute is present in the existing rule but not the generated rule, Gazelle |
| behaves as if the generated attribute were present but empty. |
| |
| * For each value in the existing rule's attribute: |
| |
| * If the value also appears in the generated rule's attribute or is marked |
| with a ``# keep`` comment, preserve it. Otherwise, delete it. |
| |
| * For each value in the generated rule's attribute: |
| |
| * If the value appears in the generated rule's attribute, ignore it. |
| Otherwise, add it to the merged rule. |
| |
| * If the merged attribute is empty, delete it. |
| |
| When a value is present in both the existing and generated attributes, we use |
| the existing value instead of the generated value, since this preserves |
| comments. |
| |
| Some attributes are considered *unmergeable*, for example, ``visibility`` and |
| ``gc_goopts``. Gazelle may add these attributes to existing rules if they are |
| not already present, but existing values won't be modified or deleted. |
| |
| Preserving customizations |
| ~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Gazelle has several mechanisms for preserving manual modifications to build |
| files. Some of these mechanisms work automatically; others require explicit |
| comments. |
| |
| * Gazelle will not modify or delete rules that don't appear to have been |
| generated by Gazelle. |
| * As mentioned above, some attributes are considered unmergeable. Gazelle may |
| set initial values for these but won't delete or replace existing values. |
| * ``# keep`` comments may be attached to any rule, attribute, or value |
| to prevent Gazelle from modifying it. |
| * ``# gazelle:exclude <file>`` directives can be used to prevent Gazelle from |
| adding files to source lists (for example, checked-in .pb.go files). They |
| can also prevent Gazelle from recursing into directories that contain |
| unbuildable code (e.g., ``testdata``). |
| * ``# gazelle:ignore`` directives prevent Gazelle from making any modifications |
| to build files that contain them. |
| |
| Deleting rules |
| ~~~~~~~~~~~~~~ |
| |
| Deletion is a special case of the merging algorithm. |
| |
| When Gazelle generates rules for a package (see `Generating rules`_), it |
| actually produces two lists of rules: a list of rules for buildable targets, |
| and a list of empty rules that may be deleted. The empty rules have no |
| attributes other than ``name``. |
| |
| The empty rules are merged using the same algorithm as the other generated |
| rules. If, after merging, an empty rule has no attributes that would make the |
| rule buildable (for example, ``srcs``, or ``deps``), the rule will be deleted. |
| |
| Resolving dependencies |
| ---------------------- |
| |
| Godoc: resolve_ |
| |
| When Gazelle generates rules for a package (see `Generating |
| rules`_), it stores names of the libraries imported by each rule in a special |
| ``_gazelle_imports`` attribute. During dependency resolution, Gazelle maps these |
| imports to Bazel labels and replaces ``_gazelle_imports`` with ``deps``. |
| |
| Before dependency resolution starts, Gazelle builds a table of all known |
| libraries. This includes ``go_library``, ``go_proto_library``, and |
| ``proto_library`` rules. The table is populated by scanning build files after |
| the pre-resolve merge, so existing and newly generated rules are included |
| in the table, and deleted rules are excluded. Once all library rules have been |
| added, Gazelle indexes the table by language-specific import path. |
| |
| Gazelle resolves each import string in ``_gazelle_imports`` as follows: |
| |
| * If the import is part of the standard library, it is dropped. Standard |
| library dependencies are implicit. |
| |
| * If the import is provided by exactly one rule in the library table, the label |
| for that rule is used. |
| |
| * If the import is provided by multiple libraries, we attempt to resolve |
| the ambiguity. |
| |
| * For Go, we apply the vendoring algorithm. Vendored libraries aren't visible |
| outside of the vendor directory's parent. |
| |
| * Go libraries that are embedded by other Go libraries are not considered. |
| Embedded libraries may be incomplete. |
| |
| * When an ambiguity can't be resolved, Gazelle logs an error and skips |
| the dependency. |
| |
| * If the import is not provided by any rule in the import table, we attempt |
| to resolve the dependency using heuristics: |
| |
| * If the import path starts with the current prefix (set with a |
| ``# gazelle:prefix`` directive or on the command line), we construct a label |
| by concatenating the prefix directory and the portion of the import path |
| below the prefix into a package name. |
| |
| * Otherwise, the import path is considered external and is resolved |
| according to the external mode set on the command line. |
| |
| * In ``external`` mode, Gazelle determines the portion of the import path |
| that corresponds to a repository using `golang.org/x/tools/go/vcs`_. This |
| part of the path is converted into a repository name (for example, |
| ``@org_golang_x_tools_go_vcs``), and the rest is converted to a package name. |
| |
| * In ``vendored`` mode, Gazelle constructs a label by prepending ``vendor/`` |
| to the import path. |
| |
| Note that ``visibility`` attributes are not considered when resolving imports. |
| This was part of an initial prototype, but it was confusing in many situations. |
| |
| Building and running Gazelle |
| ---------------------------- |
| |
| Gazelle is a regular Go program. It can be built, installed, and run without |
| Bazel, using the regular Go SDK. |
| |
| .. code:: bash |
| |
| $ go install github.com/bazelbuild/bazel-gazelle/cmd/gazelle@latest |
| $ gazelle -go_prefix example.com/project |
| |
| We lightly discourage this method of running Gazelle. All developers on a |
| project should use the same version of Gazelle to ensure the build files |
| they generate are consistent. The easiest way to accomplish this is to build |
| and run Gazelle through Bazel. Gazelle may added to a WORKSPACE file, |
| built as a normal ``go_binary``, then installed or run from the ``bazel-bin/`` |
| directory. |
| |
| .. code:: bash |
| |
| $ bazel build @bazel_gazelle//cmd/gazelle |
| $ bazel-bin/external/bazel_gazelle/cmd/gazelle/gazelle -go_prefix example.com/project |
| |
| It's usually better to invoke Gazelle through a wrapper script though. This |
| saves typing and ensures Gazelle is run with a consistent set of arguments. |
| We provide a Bazel rule that generates such a wrapper script. Developers may |
| add a snippet like the one below to a build file: |
| |
| .. code:: bzl |
| |
| load("@bazel_gazelle//:def.bzl", "gazelle") |
| |
| gazelle( |
| name = "gazelle", |
| command = "fix", |
| external = "vendored", |
| prefix = "example.com/project", |
| ) |
| |
| This script may be built and executed in a single command with ``bazel run``. |
| |
| .. code:: bash |
| |
| $ bazel run //:gazelle |
| |
| This is the most convenient way to run Gazelle, and it's what we recommend to |
| users. However, there are two issues with running Gazelle in this |
| fashion. First, binaries executed by ``bazel run`` are run in the Bazel |
| execroot, not the user's current directory. The wrapper script uses a hack |
| (dereferencing symlinks) to jump to the top of the workspace source tree before |
| running Gazelle. Second, ``bazel run`` holds a lock on the Bazel output |
| directory. This means Gazelle cannot invoke Bazel without deadlocking. Commands |
| like ``bazel query`` would be helpful for detecting generated code, but it's not |
| safe to use them. |
| |
| To avoid these limitations, the wrapper script may be copied to the workspace |
| and optionally checked into version control. When the wrapper script is run |
| directly (without ``bazel run``), it will rebuild itself to ensure no changes |
| are needed. If the rebuilt script differs from the running script, it will |
| prompt the user to copy the rebuilt script into the workspace again. |
| |
| .. code:: bash |
| |
| $ bazel build //:gazelle |
| Target //:gazelle up-to-date: |
| bazel-bin/gazelle.bash |
| ____Elapsed time: 1.326s, Critical Path: 0.00s |
| $ cp bazel-bin/gazelle.bash gazelle.bash |
| $ ./gazelle.bash |
| |
| Dependencies |
| ------------ |
| |
| Gazelle has the following dependencies: |
| |
| github.com/bazelbuild/bazel-skylib |
| Skylark utility used to generate wrapper script in the ``gazelle`` rule. |
| github.com/bazelbuild/buildtools/build |
| Used to parse and rewrite build files. |
| github.com/bazelbuild/rules_go |
| Used to build and test Gazelle through Bazel. Gazelle can aslo be built on its |
| own with the Go SDK. |
| golang.org/x/tools/vcs |
| Used during dependency resolution to determine the repository prefix for a |
| given import path. This uses the network. |