Richard Levasseur | 3730803 | 2024-05-18 09:44:18 -0700 | [diff] [blame] | 1 | # Precompiling |
| 2 | |
Richard Levasseur | 47ad4d9 | 2024-05-18 19:47:14 -0700 | [diff] [blame] | 3 | Precompiling is compiling Python source files (`.py` files) into byte code |
| 4 | (`.pyc` files) at build time instead of runtime. Doing it at build time can |
| 5 | improve performance by skipping that work at runtime. |
Richard Levasseur | 3730803 | 2024-05-18 09:44:18 -0700 | [diff] [blame] | 6 | |
Richard Levasseur | 47ad4d9 | 2024-05-18 19:47:14 -0700 | [diff] [blame] | 7 | Precompiling is disabled by default, so you must enable it using flags or |
| 8 | attributes to use it. |
Richard Levasseur | 3730803 | 2024-05-18 09:44:18 -0700 | [diff] [blame] | 9 | |
| 10 | ## Overhead of precompiling |
| 11 | |
| 12 | While precompiling helps runtime performance, it has two main costs: |
| 13 | 1. Increasing the size (count and disk usage) of runfiles. It approximately |
| 14 | double the count of the runfiles because for every `.py` file, there is also |
| 15 | a `.pyc` file. Compiled files are generally around the same size as the |
| 16 | source files, so it approximately doubles the disk usage. |
| 17 | 2. Precompiling requires running an extra action at build time. While |
| 18 | compiling itself isn't that expensive, the overhead can become noticable |
| 19 | as more files need to be compiled. |
| 20 | |
| 21 | ## Binary-level opt-in |
| 22 | |
| 23 | Because of the costs of precompiling, it may not be feasible to globally enable it |
| 24 | for your repo for everything. For example, some binaries may be |
| 25 | particularly large, and doubling the number of runfiles isn't doable. |
| 26 | |
| 27 | If this is the case, there's an alternative way to more selectively and |
| 28 | incrementally control precompiling on a per-binry basis. |
| 29 | |
| 30 | To use this approach, the two basic steps are: |
| 31 | 1. Disable pyc files from being automatically added to runfiles: |
| 32 | `--@rules_python//python/config_settings:precompile_add_to_runfiles=decided_elsewhere`, |
| 33 | 2. Set the `pyc_collection` attribute on the binaries/tests that should or should |
| 34 | not use precompiling. |
| 35 | |
| 36 | The default for the `pyc_collection` attribute is controlled by a flag, so you |
| 37 | can use an opt-in or opt-out approach by setting the flag: |
| 38 | * targets must opt-out: `--@rules_python//python/config_settings:pyc_collection=include_pyc`, |
| 39 | * targets must opt-in: `--@rules_python//python/config_settings:pyc_collection=disabled`, |
| 40 | |
| 41 | ## Advanced precompiler customization |
| 42 | |
| 43 | The default implementation of the precompiler is a persistent, multiplexed, |
| 44 | sandbox-aware, cancellation-enabled, json-protocol worker that uses the same |
| 45 | interpreter as the target toolchain. This works well for local builds, but may |
| 46 | not work as well for remote execution builds. To customize the precompiler, two |
| 47 | mechanisms are available: |
| 48 | |
| 49 | * The exec tools toolchain allows customizing the precompiler binary used with |
| 50 | the `precompiler` attribute. Arbitrary binaries are supported. |
| 51 | * The execution requirements can be customized using |
| 52 | `--@rules_python//tools/precompiler:execution_requirements`. This is a list |
| 53 | flag that can be repeated. Each entry is a key=value that is added to the |
| 54 | execution requirements of the `PyPrecompile` action. Note that this flag |
| 55 | is specific to the rules_python precompiler. If a custom binary is used, |
| 56 | this flag will have to be propagated from the custom binary using the |
| 57 | `testing.ExecutionInfo` provider; refer to the `py_interpreter_program` an |
| 58 | |
| 59 | The default precompiler implementation is an asynchronous/concurrent |
| 60 | implementation. If you find it has bugs or hangs, please report them. In the |
| 61 | meantime, the flag `--worker_extra_flag=PyPrecompile=--worker_impl=serial` can |
| 62 | be used to switch to a synchronous/serial implementation that may not perform |
| 63 | as well, but is less likely to have issues. |
| 64 | |
| 65 | The `execution_requirements` keys of most relevance are: |
| 66 | * `supports-workers`: 1 or 0, to indicate if a regular persistent worker is |
| 67 | desired. |
| 68 | * `supports-multiplex-workers`: 1 o 0, to indicate if a multiplexed persistent |
| 69 | worker is desired. |
| 70 | * `requires-worker-protocol`: json or proto; the rules_python precompiler |
| 71 | currently only supports json. |
| 72 | * `supports-multiplex-sandboxing`: 1 or 0, to indicate if sanboxing is of the |
| 73 | worker is supported. |
| 74 | * `supports-worker-cancellation`: 1 or 1, to indicate if requests to the worker |
| 75 | can be cancelled. |
| 76 | |
| 77 | Note that any execution requirements values can be specified in the flag. |
| 78 | |
| 79 | ## Known issues, caveats, and idiosyncracies |
| 80 | |
| 81 | * Precompiling requires Bazel 7+ with the Pystar rule implementation enabled. |
| 82 | * Mixing rules_python PyInfo with Bazel builtin PyInfo will result in pyc files |
| 83 | being dropped. |
| 84 | * Precompiled files may not be used in certain cases prior to Python 3.11. This |
| 85 | occurs due Python adding the directory of the binary's main `.py` file, which |
| 86 | causes the module to be found in the workspace source directory instead of |
| 87 | within the binary's runfiles directory (where the pyc files are). This can |
| 88 | usually be worked around by removing `sys.path[0]` (or otherwise ensuring the |
| 89 | runfiles directory comes before the repos source directory in `sys.path`). |
| 90 | * The pyc filename does not include the optimization level (e.g. |
| 91 | `foo.cpython-39.opt-2.pyc`). This works fine (it's all byte code), but also |
| 92 | means the interpreter `-O` argument can't be used -- doing so will cause the |
| 93 | interpreter to look for the non-existent `opt-N` named files. |