Scripts for GitHub CI

A set of gh_*.py scripts work together to produce size comparisons for PRs.

Reports on Pull Requests

The scripts' results are presented as comments on PRs.

Note that a comment may be updated by the scripts as CI run results become available.

Note that the scripts will not create a comment for a commit if there is already a newer commit in the PR.

A size report comment consists of a title followed by one to four tables. A title looks like:

PR #12345678: Size comparison from base-SHA to pr-SHA

The first table, if present, lists items with a large increase, according to a configurable threshold.

The next table, if present, lists all items that have increased in size.

The next table, if present, lists all items that have decreased in size.

The final table, always present, lists all items.

Usage in CI

The original intent was to have a tool that would run after a build in CI, add its sizes to a central database, and immediately report on size changes from the parent commit in the database. Unfortunately, GitHub provides no practical place to store and share such a database between workflow actions. Instead, the process is split; builds in CI record size information in the form of GitHub artifacts, and a later step reads these artifacts to generate reports.

1. Build workflows

gh_sizes_environment.py

The gh_sizes_environment.py script should be run once in each workflow that records sizes, after checkout and before any use of gh_sizes.py It takes a single argument, a JSON dictionary of the github context. Typically run as:

    steps:
        - name: Checkout
            uses: actions/checkout@v3
            with:
                submodules: true

        - name: Set up environment for size reports
            if: ${{ !env.ACT }}
            env:
                GH_CONTEXT: ${{ toJson(github) }}
            run: scripts/tools/memory/gh_sizes_environment.py "${GH_CONTEXT}"

gh_sizes.py

The gh_sizes.py script runs on a built binary (executable or library) and produces a JSON file containing size information.

Usage: gh_sizes.py platform config target binary [output]

Where platform is the platform name, corresponding to a config file in scripts/tools/memory/platform/.

Where config is a configuration identification string. This has no fixed meaning, but is intended to describe a build variation, e.g. a particular target board or debug vs release.

Where target is a readable name for the build artifact, identifying it in reports.

Where binary is the input build artifact.

Where output is the name for the output JSON file, or a directory for it, in which case the name will be platform-config_name-target_name-sizes.json.

Example:

    scripts/tools/memory/gh_sizes.py \
        linux arm64 thermostat-no-ble \
        out/linux-arm64-thermostat-no-ble/thermostat-app \
        /tmp/bloat_reports/

Upload artifacts

The JSON files generated by gh_sizes.py must be uploaded with an artifact name of a very specific form in order to be processed correctly.

Example:

Size,Linux-Examples,${{ env.GH_EVENT_PR }},${{ env.GH_EVENT_HASH }},${{ env.GH_EVENT_PARENT }},${{ github.event_name }}

Other builds must replace Linux-Examples with a label unique to the workflow, but otherwise use the form exactly.

2. Reporting workflow

Run a periodic workflow calling gh_report.py to generate PR comments. This script has full --help, but normal use is probably best illustrated by an example:

    scripts/tools/memory/gh_report.py \
        --verbose \
        --report-increases 0.2 \
        --report-pr \
        --github-comment \
        --github-limit-artifact-pages 50 \
        --github-limit-artifacts 500 \
        --github-limit-comments 20 \
        --github-repository project-chip/connectedhomeip \
        --github-api-token "${{ secrets.GITHUB_TOKEN }}"

Notably, the --report-increases flag provides a percent growth threshold for calling out ‘large’ increases in GitHub comments.

When this script successfully posts a comment on a GitHub PR, it removes the corresponding PR artifact(s) so that a future run will not process it again and post the same comment. Only PR artifacts are removed, not push (trunk) artifacts, since those may be used as a comparison base by many different PRs.

Using a database

It can be useful to keep a permanent record of build sizes.

Updating the database: gh_db_load.py

To update an SQLite file of trunk commit sizes, periodically run:

    gh_db_load.py \
        --repo project-chip/connectedhomeip \
        --token ghp_ThIsIsNoTMyReAlGiThUbToKeNSoDoNoTtRy \
        --db /path/to/database

Those interested in only a single platform can add the --github-label option, providing the same name as in the size artifact name after Size, (e.g. Linux-Examples in the upload example above).

See --help for additional options.

Note: Transient 4xx and 5xx errors from GitHub's API are very common. Run gh_db_load.py frequently enough to give it several attempts before the relevant artifacts expire.

Querying the database: gh_db_query.py

While the database can of course be used directly, the gh_db_query.py script provides a handful of common queries.

Note that this script (like others that show tables) has an --output-format option offering (among others) CSV, several JSON formats, and any text format provided by tabulate.

Two notable options:

  • --query-build-sizes PLATFORM,CONFIG,TARGET lists sizes for all builds of the given kind, with a column for each section.
  • --query-section-changes PLATFORM,CONFIG,TARGET,SECTION lists changes for the given section. The --report-increases PERCENT option limits this to changes over a given threshold (as is done for PR comments).

(To find out what PLATFORM, CONFIG, TARGET, and SECTION exist: --query-platforms, then --query-platform-targets=PLATFORM and --query-platform-sections=PLATFORM.)

See --help for additional options.