Using PyPI packages (aka “pip install”) involves two main steps.
To add pip dependencies to your MODULE.bazel
file, use the pip.parse
extension, and call it to create the central external repo and individual wheel external repos. Include in the MODULE.bazel
the toolchain extension as shown in the first bzlmod example above.
pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip") pip.parse( hub_name = "my_deps", python_version = "3.11", requirements_lock = "//:requirements_lock_3_11.txt", ) use_repo(pip, "my_deps")
For more documentation, including how the rules can update/create a requirements file, see the bzlmod examples under the {gh-path}examples
folder.
We are using a host-platform compatible toolchain by default to setup pip dependencies. During the setup phase, we create some symlinks, which may be inefficient on Windows by default. In that case use the following .bazelrc
options to improve performance if you have admin privileges:
startup --windows_enable_symlinks
This will enable symlinks on Windows and help with bootstrap performance of setting up the hermetic host python interpreter on this platform. Linux and OSX users should see no difference.
To add pip dependencies to your WORKSPACE
, load the pip_parse
function and call it to create the central external repo and individual wheel external repos.
load("@rules_python//python:pip.bzl", "pip_parse") # Create a central repo that knows about the dependencies needed from # requirements_lock.txt. pip_parse( name = "my_deps", requirements_lock = "//path/to:requirements_lock.txt", ) # Load the starlark macro, which will define your dependencies. load("@my_deps//:requirements.bzl", "install_deps") # Call it to define repos for your requirements. install_deps()
Note that since pip_parse
is a repository rule and therefore executes pip at WORKSPACE-evaluation time, Bazel has no information about the Python toolchain and cannot enforce that the interpreter used to invoke pip matches the interpreter used to run py_binary
targets. By default, pip_parse
uses the system command "python3"
. To override this, pass in the python_interpreter
attribute or python_interpreter_target
attribute to pip_parse
.
You can have multiple pip_parse
s in the same workspace. Or use the pip extension multiple times when using bzlmod. This configuration will create multiple external repos that have no relation to one another and may result in downloading the same wheels numerous times.
As with any repository rule, if you would like to ensure that pip_parse
is re-executed to pick up a non-hermetic change to your environment (e.g., updating your system python
interpreter), you can force it to re-execute by running bazel sync --only [pip_parse name]
.
Each extracted wheel repo contains a py_library
target representing the wheel‘s contents. There are two ways to access this library. The first uses the requirement()
function defined in the central repo’s //:requirements.bzl
file. This function maps a pip package name to a label:
load("@my_deps//:requirements.bzl", "requirement") py_library( name = "mylib", srcs = ["mylib.py"], deps = [ ":myotherlib", requirement("some_pip_dep"), requirement("another_pip_dep"), ] )
The reason requirement()
exists is to insulate from changes to the underlying repository and label strings. However, those labels have become directly used, so aren't able to easily change regardless.
On the other hand, using requirement()
has several drawbacks; see this issue for an enumeration. If you don't want to use requirement()
, you can use the library labels directly instead. For pip_parse
, the labels are of the following form:
@{name}_{package}//:pkg
Here name
is the name
attribute that was passed to pip_parse
and package
is the pip package name with characters that are illegal in Bazel label names (e.g. -
, .
) replaced with _
. If you need to update name
from “old” to “new”, then you can run the following buildozer command:
buildozer 'substitute deps @old_([^/]+)//:pkg @new_${1}//:pkg' //...:*
Any ‘extras’ specified in the requirements lock file will be automatically added as transitive dependencies of the package. In the example above, you'd just put requirement("useful_dep")
.
Sometimes PyPi packages contain dependency cycles -- for instance sphinx
depends on sphinxcontrib-serializinghtml
. When using them as requirement()
s, ala
py_binary( name = "doctool", ... deps = [ requirement("sphinx"), ] )
Bazel will protest because it doesn't support cycles in the build graph --
ERROR: .../external/pypi_sphinxcontrib_serializinghtml/BUILD.bazel:44:6: in alias rule @pypi_sphinxcontrib_serializinghtml//:pkg: cycle in dependency graph: //:doctool (...) @pypi//sphinxcontrib_serializinghtml:pkg (...) .-> @pypi_sphinxcontrib_serializinghtml//:pkg (...) | @pypi_sphinxcontrib_serializinghtml//:_pkg (...) | @pypi_sphinx//:pkg (...) | @pypi_sphinx//:_pkg (...) `-- @pypi_sphinxcontrib_serializinghtml//:pkg (...)
The experimental_requirement_cycles
argument allows you to work around these issues by specifying groups of packages which form cycles. pip_parse
will transparently fix the cycles for you and provide the cyclic dependencies simultaneously.
pip_parse( ... experimental_requirement_cycles = { "sphinx": [ "sphinx", "sphinxcontrib-serializinghtml", ] }, )
pip_parse
supports fixing multiple cycles simultaneously, however cycles must be distinct. apache-airflow
for instance has dependency cycles with a number of its optional dependencies, which means those optional dependencies must all be a part of the airflow
cycle. For instance --
pip_parse( ... experimental_requirement_cycles = { "airflow": [ "apache-airflow", "apache-airflow-providers-common-sql", "apache-airflow-providers-postgres", "apache-airflow-providers-sqlite", ] } )
Alternatively, one could resolve the cycle by removing one leg of it.
For example while apache-airflow-providers-sqlite
is “baked into” the Airflow package, apache-airflow-providers-postgres
is not and is an optional feature. Rather than listing apache-airflow[postgres]
in your requirements.txt
which would expose a cycle via the extra, one could either manually depend on apache-airflow
and apache-airflow-providers-postgres
separately as requirements. Bazel rules which need only apache-airflow
can take it as a dependency, and rules which explicitly want to mix in apache-airflow-providers-postgres
now can.
Alternatively, one could use rules_python
's patching features to remove one leg of the dependency manually. For instance by making apache-airflow-providers-postgres
not explicitly depend on apache-airflow
or perhaps apache-airflow-providers-common-sql
.
If you need to depend on the wheel dists themselves, for instance, to pass them to some other packaging tool, you can get a handle to them with the whl_requirement
macro. For example:
filegroup( name = "whl_files", data = [ whl_requirement("boto3"), ] )