A set of gh_*.py
scripts work together to produce size comparisons for PRs.
The scripts' results are presented as comments on PRs.
Note that a comment may be updated by the scripts as CI run results become available.
Note that the scripts will not create a comment for a commit if there is already a newer commit in the PR.
A size report comment consists of a title followed by one to four tables. A title looks like:
PR #12345678: Size comparison from
base-SHA
topr-SHA
The first table, if present, lists items with a large increase, according to a configurable threshold.
The next table, if present, lists all items that have increased in size.
The next table, if present, lists all items that have decreased in size.
The final table, always present, lists all items.
The original intent was to have a tool that would run after a build in CI, add its sizes to a central database, and immediately report on size changes from the parent commit in the database. Unfortunately, GitHub provides no practical place to store and share such a database between workflow actions. Instead, the process is split; builds in CI record size information in the form of GitHub artifacts, and a later step reads these artifacts to generate reports.
The gh_sizes_environment.py
script should be run once in each workflow that records sizes, after checkout and before any use of gh_sizes.py
It takes a single argument, a JSON dictionary of the github
context. Typically run as:
steps: - name: Checkout uses: actions/checkout@v3 with: submodules: true - name: Set up environment for size reports if: ${{ !env.ACT }} env: GH_CONTEXT: ${{ toJson(github) }} run: scripts/tools/memory/gh_sizes_environment.py "${GH_CONTEXT}"
The gh_sizes.py
script runs on a built binary (executable or library) and produces a JSON file containing size information.
Usage: gh_sizes.py
platform config target binary [output]
Where platform is the platform name, corresponding to a config file in scripts/tools/memory/platform/
.
Where config is a configuration identification string. This has no fixed meaning, but is intended to describe a build variation, e.g. a particular target board or debug vs release.
Where target is a readable name for the build artifact, identifying it in reports.
Where binary is the input build artifact.
Where output is the name for the output JSON file, or a directory for it, in which case the name will be platform-
config_name-
target_name-sizes.json
.
Example:
scripts/tools/memory/gh_sizes.py \ linux arm64 thermostat-no-ble \ out/linux-arm64-thermostat-no-ble/thermostat-app \ /tmp/bloat_reports/
The JSON files generated by gh_sizes.py
must be uploaded with an artifact name of a very specific form in order to be processed correctly.
Example:
Size,Linux-Examples,${{ env.GH_EVENT_PR }},${{ env.GH_EVENT_HASH }},${{ env.GH_EVENT_PARENT }},${{ github.event_name }}
Other builds must replace Linux-Examples
with a label unique to the workflow, but otherwise use the form exactly.
Run a periodic workflow calling gh_report.py
to generate PR comments. This script has full --help
, but normal use is probably best illustrated by an example:
scripts/tools/memory/gh_report.py \ --verbose \ --report-increases 0.2 \ --report-pr \ --github-comment \ --github-limit-artifact-pages 50 \ --github-limit-artifacts 500 \ --github-limit-comments 20 \ --github-repository project-chip/connectedhomeip \ --github-api-token "${{ secrets.GITHUB_TOKEN }}"
Notably, the --report-increases
flag provides a percent growth threshold for calling out ‘large’ increases in GitHub comments.
When this script successfully posts a comment on a GitHub PR, it removes the corresponding PR artifact(s) so that a future run will not process it again and post the same comment. Only PR artifacts are removed, not push (trunk) artifacts, since those may be used as a comparison base by many different PRs.
It can be useful to keep a permanent record of build sizes.
gh_db_load.py
To update an SQLite file of trunk commit sizes, periodically run:
gh_db_load.py \ --repo project-chip/connectedhomeip \ --token ghp_ThIsIsNoTMyReAlGiThUbToKeNSoDoNoTtRy \ --db /path/to/database
Those interested in only a single platform can add the --github-label
option, providing the same name as in the size artifact name after Size,
(e.g. Linux-Examples
in the upload example above).
See --help
for additional options.
Note: Transient 4xx and 5xx errors from GitHub's API are very common. Run gh_db_load.py
frequently enough to give it several attempts before the relevant artifacts expire.
gh_db_query.py
While the database can of course be used directly, the gh_db_query.py
script provides a handful of common queries.
Note that this script (like others that show tables) has an --output-format
option offering (among others) CSV, several JSON formats, and any text format provided by tabulate.
Two notable options:
--query-build-sizes PLATFORM,CONFIG,TARGET
lists sizes for all builds of the given kind, with a column for each section.--query-section-changes PLATFORM,CONFIG,TARGET,SECTION
lists changes for the given section. The --report-increases PERCENT
option limits this to changes over a given threshold (as is done for PR comments).(To find out what PLATFORM, CONFIG, TARGET, and SECTION exist: --query-platforms
, then --query-platform-targets=PLATFORM
and --query-platform-sections=PLATFORM
.)
See --help
for additional options.