CHIP on-device testing

Requirements and high-level design

Background

The ability to run tests on actual and emulated hardware is paramount in embedded projects. CHIP is no exception. We want on-device testing to be a first class goal of CHIP architecture. On-device testing requirements apply both to Continuous Integration testing for main CHIP software stack development and to eventual CHIP product certification. This document explores the requirements and evaluates potential solutions.

Overview of requirements

A good device test infrastructure is built on four pillars.

Pillar 1: Using a test framework

A test framework provides a testing structure that developers can follow and potentially reduces some of the burden of test setup and teardown (less boilerplate). Support for state-oriented and asynchronous structuring of tests would be beneficial. Many test frameworks leverage scripting languages such as Python to simplify the quick development of tests and to leverage rich sets of libraries for device/systems access and results generation.

Pillar 2: Dispatching tests

Tests can run on lab machines or on the developer's local workstation. Tests can be triggered manually by the developer or as a result of completion of a changeset built on a continuous integration (CI) server. CHIP involves multiple stakeholders, many of which will want to contribute to the testing efforts with lab capacity. The infrastructure therefore must be prepared for cross-organization test dispatch.

To facilitate uniform dispatch of tests we will probably need a simple request/response protocol. Potentially HTTPS based and RESTful. Due to the long running nature of device tests the response for a test scheduling request could be a test ID, not the test result. That ID could be used to query the test status, subscribe for notifications on status changes and to pull the test results. Core aspects of such a scheme include the conventions for request artifacts contents and minimum expected results contents once the run is complete.

Pillar 3: Interacting with devices

The test host environment has to reset devices, flash images on them, issue commands, monitor status and collect test results. It may also need to integrate both virtual (simulated) and real devices together. This can at first be done in an ad-hoc way per platform but eventually we can go into device access abstraction, i.e. define a common device testing interface which CHIP-compliant devices can expose. The test host has to be prepared for driving multiple devices at the same time for a single test, e.g. for tests that check communication between multiple devices.

Pillar 4: Collecting results

Ideally, test results are output in standard formats and similar or analogous results between different devices and tests are output the same way. This ensures reusability of code that processes similar data while allowing aggregation of results across different dimensions. Failed tests must propagate errors from device platform layers all the way to the CHIP stack and present errors and potential stack traces in a standard result format. As the purpose of on-device tests is to capture bugs, it is important that the test outputs highlight the failure reason(s) and developers don't have to browse through thousands of lines of logs to find the one line that sheds light on why a test failed.

Priorities

In the spirit of CHIP's charter, it would be great to see something taking-off as soon as possible, to support continuous testing of the evolving CHIP stack. We could then improve on that first iteration, even if we have to throw away some temporary concepts and code.

Test dispatch (Pillar 2) arises as the highest priority, because all other pillars can have ad-hoc solutions. The first need is an interface between a CircleCI job and a test execution host at a participating organization. This would enable dispatching tests to a variety of existing in-house infrastructure, while retaining common request/response protocols to shield the CI system from implementation details of each lab.

The next most important goal is to provide a test framework (Pillar 1). With a standard framework developers can start writing tests, even if those tests will be device specific and of ad-hoc input and output format. The general structure of tests will however be present and later the tests can be adapted to standard interactions (Pillar 3) and result formats (Pillar 4).

Specifying result formats (Pillar 4) for the most common outputs (success/failure, failure reason, stack trace, memory and CPU usage time series, pcaps of network traffic, etc.) will be an ongoing effort. The simplest output formats can be specified together with the test framework.

Lastly, we want to look into a common device interaction interface that would enable reusing tests between different devices.

Baseline hardware platforms for CHIP

The TSG is targeting the following platforms/boards for early bringup:

  • Nordic nRF52 board <TODO: REF>
  • SiLabs XXXX board TODO:REF
  • Espressif ESP32 XXXX board TODO:REF