Design: Heuristic Board Discovery and Filtering

Background

Desired User Experience

The goal of the Zephyr-Bazel integration is to provide a seamless developer experience that mirrors the flexibility of the native Zephyr build system while leveraging Bazel's correctness and speed. Ideally, a user should be able to add a new application or a new board to the workspace, and Bazel should automatically discover the new combinations without requiring manual registration in MODULE.bazel or other configuration files. Developers expect to build any app for any board using a simple command like bazel build //apps/blinky --platforms=//boards/my_board.

Limitations of Bazel and Bzlmod

While the current design achieves this automatic discovery by eagerly declaring repositories for the Cartesian product of all apps and boards in a module extension, it hits a hard scaling limit due to Bazel's Bzlmod lockfile implementation:

Lockfile Bloat: Module extensions must declare all repositories they produce during the loading phase. For $N$ apps and $M$ boards, this results in $N \times M$ repository declarations. Bazel records all these declarations in the MODULE.bazel.lock file.
Performance Degradation: With a large number of apps and boards, the lockfile grows to tens of megabytes. Reading, parsing, and updating this massive JSON file during the analysis phase causes severe performance degradation, even when the lockfile is only being checked and not updated.
Lazy Execution vs. Eager Declaration: Bazel's repository rules are executed lazily (only when needed), but they are declared eagerly by the extension. The performance penalty is paid in the loading/analysis phase simply by having the repositories registered in the lockfile, regardless of whether they are part of the current build graph.

To overcome these limitations, we need a mechanism to prune the number of declared repositories during the loading phase while maintaining as much of the automatic discovery experience as possible.

Repository Structure Alternatives

One Repository per Application

We considered moving from a repository per board-app combination to a repository per app model. Two variants were explored:

Eager Evaluation per App: The repository rule runs Zephyr scripts for all boards supported by the app. This was not chosen because the time to parse Kconfig for hundreds or thousands of in-tree boards sequentially during the fetch phase is not viable and would cause unacceptable delays.
Lazy Evaluation via Actions: The repository rule generates targets that use standard Bazel actions to parse configurations lazily. This was not chosen because it causes the loss of select() behavior on Kconfig symbols. Losing select() behavior cannot be done since this is the core methodology for correctly generating the build for all the Zephyr kernel, driver, and subsystem code based on the configuration.

One Repository for All Applications

This option is even worse than the current design. It would require a single repository to handle all applications and boards. This would either force eager evaluation of the entire Cartesian product of apps and boards (which is impossible at scale) or require complex custom rules that still cannot overcome Bazel's limitations regarding dynamic target generation.

Conclusion on Structure

Because we cannot sacrifice lazy evaluation of the matrix, and we cannot lose the ability to use Kconfig symbols in Bazel select() statements, the existing system of one repository per combination is required.

Board Down-Selection Alternatives

Since the repository-per-combination structure is required, the only way to reduce the lockfile size is to limit the number of combinations declared in the loading phase. We reviewed several methods for down-selecting boards:

1. Explicit Allowlist in MODULE.bazel

Description: Users manually list active boards in MODULE.bazel.
Pros: Keeps the lockfile small and predictable.
Cons: Violates the “Automatic Discovery” goal and requires manual maintenance.

2. Environment Variables

Description: The module extension reads an environment variable (e.g., ZEPHYR_BOARDS) to filter boards.
Pros: Dynamic, good for local development switching.
Cons: Changing the variable invalidates the extension and forces lockfile updates.

3. Smart Heuristics based on App Structure

Description: The extension scans the app's boards/ directory and only generates repositories for boards that have specific configuration files or overlays.
Pros: Maintains automatic discovery while drastically reducing the matrix size.
Cons: Skips boards that might work with defaults (requires a fallback mechanism).

Proposed Solution: Stratified Heuristic Discovery

Overview

The proposed solution adopts a stratified approach to reducing the number of declared repositories. It applies different heuristic rules based on the origin of the assets (In-Tree vs. Out-of-Tree) and the presence of Zephyr metadata files, while maintaining a fallback mechanism.

Instead of generating repositories for the full Cartesian product, the module extension applies the following priority-based rules for each (app, board) combination:

Out-of-Tree Origin Exemption: If both the application and the board are Out-of-Tree (OOT), the extension generates the repository eagerly. This assumes that the number of custom OOT apps and boards is small, avoiding unnecessary pruning overhead. If either the application or the board is an in-tree asset (mixed cases), the combination follows the heuristics rules below.
Metadata-Driven Filtering (Tests/Samples): If the application directory contains a testcase.yaml or sample.yaml file, the extension parses it. To prevent lockfile Cartesian product explosion, it only generates repositories for boards that are explicitly listed in platform_allow. If a test relies on broad dynamic constraints (like arch_allow, filter, min_flash) or exclusion (platform_exclude), the extension ignores Rule 2 and drops down to Rule 3 (Filesystem heuristics).
Filesystem-Driven Filtering (Heuristics): For all other combinations (including mixed cases involving in-tree boards or in-tree apps), the extension scans the application's boards/ directory. It only generates repositories for boards that have specific config files (e.g., <board>.conf or <board>.overlay) or specific revision overrides in that folder.
Fallback Filter: To support building apps for boards that rely on default configurations without specific files, the extension accepts an explicit list of manual boards via MODULE.bazel or an environment variable. Repositories are generated for these boards for all apps.

Pros

Precise Lockfile Reduction: The number of declared repositories scales with the number of supported combinations.
High Fidelity for Tests: Using testcase.yaml prevents generating thousands of useless test/board combinations.
Maintains Automatic Discovery: Adding a board file or updating a YAML file automatically triggers discovery.
Preserves select() and Lazy Execution: Keeps the repository-per- combination model.

Cons

Increased Discovery Logic Complexity: The extension must handle YAML parsing and directory scanning.

Detailed Design

1. Discovery Strategy Implementation

The discovery pruning logic will be implemented inside _zephyr_setup_core_impl in setup.bzl. This function is responsible for scanning application directories, applying heuristics validation rules filtering, and setting up configurations Cartesian survival indexes.

Origin and Metadata Checks

Rule 1: OOT Origin Exemption: For each (app, board) pair, if the app is located in an out-of-tree directory, and the board is also an OOT board, the extension will unconditionally generate the configuration repository. In mixed cases (where one of the two is an in-tree asset), the combination is subject to the heuristics metadata filtering rules.

An app is considered Out-of-Tree if it originates from a directory explicitly listed in apps_dirs. A board is considered Out-of-Tree if it originates from a directory explicitly listed in boards_dirs. The in-tree boards folder inside zephyr_root is automatically scanned and considered in-tree.

Rule 2: Metadata-driven Filtering (YAML): Before directory scanning, the extension will look for testcase.yaml or sample.yaml in the app's root. To parse these files, it will invoke a small Python helper script. This script logic is run from inside the zephyr_setup module extension where the path to @zephyr has already been resolved and dynamically supplied in @zephyr_state//:state.json.

To safely resolve the path to the helper script inside the module extension in setup.bzl, the extension should use script_path = mctx.path(Label("@zephyr- bazel//scripts/build:parse_test_metadata.py")).

[!IMPORTANT] > To prevent Cartesian product scaling and analysis loading phase performance degradation, the helper script must > process all applications at once. It receives a JSON mapping of application package names to their absolute paths > via a --apps-json argument and returns a resolved JSON mapping of valid board combinations.

The Python helper script dynamically integrates @zephyr//scripts/pylib/twister into its sys.path runtime Python imports environment. If dynamic twister dependencies (like ruamel.yaml) are missing from the Python runtime environment, the script should fall back to a lightweight, internal YAML parsing logic for platform_allow statements to be robust during the Bazel loading phase.

The script evaluates the YAML file and returns the union of all boards explicitly allowed under platform_allow. If the file contains broad dynamic constraints (like arch_allow), the script returns an empty set, directing the extension to skip Rule 2 and drop down to Rule 3 (boards/ scan heuristics).

[!NOTE] > Returning a massive list of matching boards dynamically constraint scenarios (such as all arch_allow: arm > boards) would trigger lockfile explosions again, which violates the optimization goal of the discovery > pruning structure. Dropping down to Rule 3 acts as validation fallback rules.

The output forms a JSON structure which Starlark decodes using json.decode().

The script must only write the final JSON array to stdout. All other logging, warnings, or debugging print statements must be routed to sys.stderr to ensure that Starlark's json.decode(res.stdout) behaves robustly.

Filesystem Heuristics and Fallback

Rule 3: Filesystem Pruning (boards/ scan): For combinations involving in-tree boards without matching YAML metadata, the extension will use the shared filesystem heuristics module scripts/build/discovery_utils.py to search for overrides. A repository is declared if <board>.conf, <board>.overlay, or specific revision overrides exist in the app's boards/ directory.

If a board has qualified names (e.g. board/qualifiers), candidate files should follow the syntax format <board_name>_<qualifiers>.conf. This heuristics logic should be extracted and shared from kconfig_gen_values.py:268-274.

Rule 4: Manual Boards Fallback: The global manual list defined in MODULE.bazel (e.g., manual_boards = ["nrf52840dk_nrf52840"]) will be unconditionally combined with any discovered lists, acting as a fallback.

2. Manual Boards Syntax & Implementation

To support apps that rely on the default configuration without custom overlays, users can explicitly allow list boards using the manual_boards parameter in their MODULE.bazel.

Configuration Syntax in `MODULE.bazel`

zephyr_setup.env(
    apps_dirs = ["//apps"],
    boards_dirs = ["//boards"],
    manual_boards = [
        "nrf52840dk_nrf52840",
        "same70q21b",
    ],
)

Modifying `setup.bzl`

Tag Class: Add manual_boards = attr.string_list() to the _env tag class attributes in setup.bzl.
Extension Logic:
- In _zephyr_setup_core_impl, we iterate over tags and extend a workspace manual_boards list.
- Add manual_boards to the JSON written to @zephyr_state//:state.json.
Pruning logic: In _zephyr_setup_apps_impl, the extension will pull in manual_boards from the state and ensure repositories are always generated for combinations involving these boards for all apps in the discovery loop.

Implementation Plan

The implementation will proceed in the following stages:

Phase 1: Manual Boards Integration

File: setup.bzl
Changes:
- Line 665: Add manual_boards = attr.string_list() to the _env tag class attributes.
- Line 574: In _zephyr_setup_core_impl, aggregate manual boards lists, add it to the state_data dictionary mapping, and assign it to @zephyr_state//:state.json.

Phase 2: Heuristic Logic and Pruning

Files: setup.bzl, scripts/build/parse_test_metadata.py [NEW], scripts/build/discovery_utils.py [NEW]
Twister Metadata Parsing Python script:
- Accepts --apps-json and --zephyr-root.
- Inserts dynamic runtime paths: sys.path.insert(0, args.zephyr_root + "/scripts/pylib/twister").
- Processes multiple applications in a single batch, returning a JSON mapping of valid app-board combinations list.
Shared discovery file configurations module:
- Extract overlapping filesystem candidates search logic from kconfig_gen_values.py:268-274.
- Expose shared utilities checklists validation module candidates.
Apps Heuristics logic extension loops:
- Line 572 in _zephyr_setup_core_impl: Inject combinations pruning heuristics rules setup after scanning apps and boards.
- Invoke test metadata YAML parsing rules in Core module with global python helper script validation.

Phase 3: Index Building and Validation

File: setup.bzl
Changes:
- _zephyr_setup_core_impl: Filter surviving combinations and store valid combinations in state_data as a dictionary mapping norm_app_pkg: [supported_board_names].
- _zephyr_setup_apps_impl: Ensure the extension relies on this dictionary inside @zephyr_state and only generates configuration repositories for combinations that survived pruning.
- _zephyr_setup_core_impl:594: Update Cartesian index builder generator rules for @zephyr_index. Only generate indices mapping combinations that survived heuristics pruning logic validation set.
- Run and validate with local build updates.