# Design: Heuristic Board Discovery and Filtering

## Background

### Desired User Experience
The goal of the Zephyr-Bazel integration is to provide a seamless developer
experience that mirrors the flexibility of the native Zephyr build system while
leveraging Bazel's correctness and speed. Ideally, a user should be able to add
a new application or a new board to the workspace, and Bazel should
automatically discover the new combinations without requiring manual
registration in `MODULE.bazel` or other configuration files. Developers expect
to build any app for any board using a simple command like `bazel build
//apps/blinky --platforms=//boards/my_board`.

### Limitations of Bazel and Bzlmod
While the current design achieves this automatic discovery by eagerly declaring
repositories for the Cartesian product of all apps and boards in a module
extension, it hits a hard scaling limit due to Bazel's Bzlmod lockfile
implementation:

1.  **Lockfile Bloat**: Module extensions must declare all repositories they
   produce during the loading phase.
For $N$ apps and $M$ boards, this results in $N \times M$ repository
declarations. Bazel records all these declarations in the `MODULE.bazel.lock`
file.
2.  **Performance Degradation**: With a large number of apps and boards, the
   lockfile grows to tens of megabytes.
Reading, parsing, and updating this massive JSON file during the analysis phase
causes severe performance degradation, even when the lockfile is only being
checked and not updated.
3.  **Lazy Execution vs. Eager Declaration**: Bazel's repository rules are
   executed lazily (only when needed),
but they are *declared* eagerly by the extension. The performance penalty is
paid in the loading/analysis phase simply by having the repositories registered
in the lockfile, regardless of whether they are part of the current build graph.

To overcome these limitations, we need a mechanism to prune the number of
declared repositories during the loading phase while maintaining as much of the
automatic discovery experience as possible.

## Repository Structure Alternatives

### One Repository per Application
We considered moving from a repository per board-app combination to a repository
per app model. Two variants were explored:

1.  **Eager Evaluation per App**: The repository rule runs Zephyr scripts for
   all boards supported by the app.
This was not chosen because the time to parse Kconfig for hundreds or thousands
of in-tree boards sequentially during the fetch phase is not viable and would
cause unacceptable delays.
2.  **Lazy Evaluation via Actions**: The repository rule generates targets that
   use standard Bazel actions to parse
configurations lazily. This was not chosen because it causes the loss of
`select()` behavior on Kconfig symbols. Losing `select()` behavior cannot be
done since this is the core methodology for correctly generating the build for
all the Zephyr kernel, driver, and subsystem code based on the configuration.

### One Repository for All Applications
This option is even worse than the current design. It would require a single
repository to handle all applications and boards. This would either force eager
evaluation of the entire Cartesian product of apps and boards (which is
impossible at scale) or require complex custom rules that still cannot overcome
Bazel's limitations regarding dynamic target generation.

### Conclusion on Structure
Because we cannot sacrifice lazy evaluation of the matrix, and we cannot lose
the ability to use Kconfig symbols in Bazel `select()` statements, the existing
system of **one repository per combination** is required.

## Board Down-Selection Alternatives

Since the repository-per-combination structure is required, the only way to
reduce the lockfile size is to limit the number of combinations declared in the
loading phase. We reviewed several methods for down-selecting boards:

### 1. Explicit Allowlist in MODULE.bazel
*   **Description**: Users manually list active boards in `MODULE.bazel`.
*   **Pros**: Keeps the lockfile small and predictable.
*   **Cons**: Violates the "Automatic Discovery" goal and requires manual
  maintenance.

### 2. Environment Variables
*   **Description**: The module extension reads an environment variable (e.g.,
  `ZEPHYR_BOARDS`) to filter boards.
*   **Pros**: Dynamic, good for local development switching.
*   **Cons**: Changing the variable invalidates the extension and forces
  lockfile updates.

### 3. Smart Heuristics based on App Structure
*   **Description**: The extension scans the app's `boards/` directory and only
  generates repositories for boards
that have specific configuration files or overlays.
*   **Pros**: Maintains automatic discovery while drastically reducing the
  matrix size.
*   **Cons**: Skips boards that might work with defaults (requires a fallback
  mechanism).

## Proposed Solution: Stratified Heuristic Discovery

### Overview
The proposed solution adopts a stratified approach to reducing the number of
declared repositories. It applies different heuristic rules based on the origin
of the assets (In-Tree vs. Out-of-Tree) and the presence of Zephyr metadata
files, while maintaining a fallback mechanism.

Instead of generating repositories for the full Cartesian product, the module
extension applies the following priority-based rules for each `(app, board)`
combination:

1.  **Out-of-Tree Origin Exemption**: If **both** the application and the board
   are Out-of-Tree (OOT), the extension
generates the repository eagerly. This assumes that the number of custom OOT
apps and boards is small, avoiding unnecessary pruning overhead. If either the
application or the board is an in-tree asset (mixed cases), the combination
follows the heuristics rules below.
2.  **Metadata-Driven Filtering (Tests/Samples)**: If the application directory
   contains a `testcase.yaml` or
`sample.yaml` file, the extension parses it. To prevent lockfile Cartesian
product explosion, it **only** generates repositories for boards that are
explicitly listed in `platform_allow`. If a test relies on broad dynamic
constraints (like `arch_allow`, `filter`, `min_flash`) or exclusion
(`platform_exclude`), the extension ignores Rule 2 and drops down to Rule 3
(Filesystem heuristics).
3.  **Filesystem-Driven Filtering (Heuristics)**: For all other combinations
   (including mixed cases involving in-tree
boards or in-tree apps), the extension scans the application's `boards/`
directory. It **only** generates repositories for boards that have specific
config files (e.g., `<board>.conf` or `<board>.overlay`) or specific revision
overrides in that folder.
4.  **Fallback Filter**: To support building apps for boards that rely on
   default configurations without specific
files, the extension accepts an explicit list of manual boards via
`MODULE.bazel` or an environment variable. Repositories are generated for these
boards for all apps.

### Pros
*   **Precise Lockfile Reduction**: The number of declared repositories scales
  with the number of supported combinations.
*   **High Fidelity for Tests**: Using `testcase.yaml` prevents generating
  thousands of useless test/board combinations.
*   **Maintains Automatic Discovery**: Adding a board file or updating a YAML
  file automatically triggers discovery.
*   **Preserves `select()` and Lazy Execution**: Keeps the repository-per-
  combination model.

### Cons
*   **Increased Discovery Logic Complexity**: The extension must handle YAML
  parsing and directory scanning.

## Detailed Design

### 1. Discovery Strategy Implementation

The discovery pruning logic will be implemented inside `_zephyr_setup_core_impl`
in `setup.bzl`. This function is responsible for scanning application
directories, applying heuristics validation rules filtering, and setting up
configurations Cartesian survival indexes.

#### Origin and Metadata Checks
*   **Rule 1: OOT Origin Exemption**:
For each `(app, board)` pair, if the app is located in an out-of-tree directory,
and the board is also an OOT board, the extension will unconditionally generate
the configuration repository. In mixed cases (where one of the two is an in-tree
asset), the combination is subject to the heuristics metadata filtering rules.

An app is considered Out-of-Tree if it originates from a directory explicitly
listed in `apps_dirs`. A board is considered Out-of-Tree if it originates from a
directory explicitly listed in `boards_dirs`. The in-tree boards folder inside
`zephyr_root` is automatically scanned and considered in-tree.
*   **Rule 2: Metadata-driven Filtering (YAML)**:
Before directory scanning, the extension will look for `testcase.yaml` or
`sample.yaml` in the app's root. To parse these files, it will invoke a small
Python helper script. This script logic is run from inside the `zephyr_setup`
module extension where the path to `@zephyr` has already been resolved and
dynamically supplied in `@zephyr_state//:state.json`.

To safely resolve the path to the helper script inside the module extension in
`setup.bzl`, the extension should use `script_path = mctx.path(Label("@zephyr-
bazel//scripts/build:parse_test_metadata.py"))`.

> [!IMPORTANT] > To prevent Cartesian product scaling and analysis loading phase
performance degradation, the helper script must > process all applications at
once. It receives a JSON mapping of application package names to their absolute
paths > via a `--apps-json` argument and returns a resolved JSON mapping of
valid board combinations.

The Python helper script dynamically integrates `@zephyr//scripts/pylib/twister`
into its `sys.path` runtime Python imports environment. If dynamic twister
dependencies (like `ruamel.yaml`) are missing from the Python runtime
environment, the script should fall back to a lightweight, internal YAML parsing
logic for `platform_allow` statements to be robust during the Bazel loading
phase.

The script evaluates the YAML file and returns the union of all boards
explicitly allowed under `platform_allow`. If the file contains broad dynamic
constraints (like `arch_allow`), the script returns an empty set, directing the
extension to skip Rule 2 and drop down to Rule 3 (boards/ scan heuristics).

> [!NOTE] > Returning a massive list of matching boards dynamically constraint
scenarios (such as all `arch_allow: arm` > boards) would trigger lockfile
explosions again, which violates the optimization goal of the discovery >
pruning structure. Dropping down to Rule 3 acts as validation fallback rules.

The output forms a JSON structure which Starlark decodes using `json.decode()`.

The script must only write the final JSON array to `stdout`. All other logging,
warnings, or debugging print statements must be routed to `sys.stderr` to ensure
that Starlark's `json.decode(res.stdout)` behaves robustly.

#### Filesystem Heuristics and Fallback
*   **Rule 3: Filesystem Pruning (boards/ scan)**:
For combinations involving in-tree boards without matching YAML metadata, the
extension will use the shared filesystem heuristics module
`scripts/build/discovery_utils.py` to search for overrides. A repository is
declared if `<board>.conf`, `<board>.overlay`, or specific revision overrides
exist in the app's `boards/` directory.

If a board has qualified names (e.g. `board/qualifiers`), candidate files should
follow the syntax format `<board_name>_<qualifiers>.conf`. This heuristics logic
should be extracted and shared from `kconfig_gen_values.py:268-274`.
*   **Rule 4: Manual Boards Fallback**:
The global manual list defined in `MODULE.bazel` (e.g., `manual_boards =
["nrf52840dk_nrf52840"]`) will be unconditionally combined with any discovered
lists, acting as a fallback.

### 2. Manual Boards Syntax & Implementation

To support apps that rely on the default configuration without custom overlays,
users can explicitly allow list boards using the `manual_boards` parameter in
their `MODULE.bazel`.

#### Configuration Syntax in `MODULE.bazel`
```python
zephyr_setup.env(
    apps_dirs = ["//apps"],
    boards_dirs = ["//boards"],
    manual_boards = [
        "nrf52840dk_nrf52840",
        "same70q21b",
    ],
)
```

#### Modifying `setup.bzl`
*   **Tag Class**: Add `manual_boards = attr.string_list()` to the `_env` tag
  class attributes in `setup.bzl`.
*   **Extension Logic**:
    *   In `_zephyr_setup_core_impl`, we iterate over tags and extend a
      workspace `manual_boards` list.
    *   Add `manual_boards` to the JSON written to `@zephyr_state//:state.json`.
*   **Pruning logic**: In `_zephyr_setup_apps_impl`, the extension will pull in
  `manual_boards` from the state and
ensure repositories are always generated for combinations involving these boards
for all apps in the discovery loop.

## Implementation Plan

The implementation will proceed in the following stages:

### Phase 1: Manual Boards Integration
*   **File**: `setup.bzl`
*   **Changes**:
    *   Line 665: Add `manual_boards = attr.string_list()` to the `_env` tag
      class attributes.
    *   Line 574: In `_zephyr_setup_core_impl`, aggregate manual boards lists,
      add it to the `state_data` dictionary
mapping, and assign it to `@zephyr_state//:state.json`.

### Phase 2: Heuristic Logic and Pruning
*   **Files**: `setup.bzl`, `scripts/build/parse_test_metadata.py` [NEW],
  `scripts/build/discovery_utils.py` [NEW]
*   **Twister Metadata Parsing Python script**:
    *   Accepts `--apps-json` and `--zephyr-root`.
    *   Inserts dynamic runtime paths: `sys.path.insert(0, args.zephyr_root +
      "/scripts/pylib/twister")`.
    *   Processes multiple applications in a single batch, returning a JSON
      mapping of valid app-board combinations list.
*   **Shared discovery file configurations module**:
    *   Extract overlapping filesystem candidates search logic from
      `kconfig_gen_values.py:268-274`.
    *   Expose shared utilities checklists validation module candidates.
*   **Apps Heuristics logic extension loops**:
    *   Line 572 in `_zephyr_setup_core_impl`: Inject combinations pruning
      heuristics rules setup after scanning
apps and boards.
    *   Invoke test metadata YAML parsing rules in Core module with global
      python helper script validation.

### Phase 3: Index Building and Validation
*   **File**: `setup.bzl`
*   **Changes**:
    *   `_zephyr_setup_core_impl`: Filter surviving combinations and store valid
      combinations in `state_data` as a dictionary mapping `norm_app_pkg:
      [supported_board_names]`.
    *   `_zephyr_setup_apps_impl`: Ensure the extension relies on this
      dictionary inside `@zephyr_state` and only generates configuration
      repositories for combinations that survived pruning.
    *   `_zephyr_setup_core_impl:594`: Update Cartesian index builder generator
      rules for `@zephyr_index`.
Only generate indices mapping combinations that survived heuristics pruning
logic validation set.
    *   Run and validate with local build updates.


