blob: f08f7fb7a7b1c4be8d9c82dacbcd19b0b2175959 [file] [log] [blame] [view]
# Using dependencies from PyPI
Using PyPI packages (aka "pip install") involves two main steps.
1. [Installing third party packages](#installing-third-party-packages)
2. [Using third party packages as dependencies](#using-third-party-packages)
{#installing-third-party-packages}
## Installing third party packages
### Using bzlmod
To add pip dependencies to your `MODULE.bazel` file, use the `pip.parse`
extension, and call it to create the central external repo and individual wheel
external repos. Include in the `MODULE.bazel` the toolchain extension as shown
in the first bzlmod example above.
```starlark
pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
hub_name = "my_deps",
python_version = "3.11",
requirements_lock = "//:requirements_lock_3_11.txt",
)
use_repo(pip, "my_deps")
```
For more documentation, including how the rules can update/create a requirements
file, see the bzlmod examples under the {gh-path}`examples` folder.
We are using a host-platform compatible toolchain by default to setup pip dependencies.
During the setup phase, we create some symlinks, which may be inefficient on Windows
by default. In that case use the following `.bazelrc` options to improve performance if
you have admin privileges:
```
startup --windows_enable_symlinks
```
This will enable symlinks on Windows and help with bootstrap performance of setting up the
hermetic host python interpreter on this platform. Linux and OSX users should see no
difference.
### Using a WORKSPACE file
To add pip dependencies to your `WORKSPACE`, load the `pip_parse` function and
call it to create the central external repo and individual wheel external repos.
```starlark
load("@rules_python//python:pip.bzl", "pip_parse")
# Create a central repo that knows about the dependencies needed from
# requirements_lock.txt.
pip_parse(
name = "my_deps",
requirements_lock = "//path/to:requirements_lock.txt",
)
# Load the starlark macro, which will define your dependencies.
load("@my_deps//:requirements.bzl", "install_deps")
# Call it to define repos for your requirements.
install_deps()
```
### pip rules
Note that since `pip_parse` is a repository rule and therefore executes pip at
WORKSPACE-evaluation time, Bazel has no information about the Python toolchain
and cannot enforce that the interpreter used to invoke pip matches the
interpreter used to run `py_binary` targets. By default, `pip_parse` uses the
system command `"python3"`. To override this, pass in the `python_interpreter`
attribute or `python_interpreter_target` attribute to `pip_parse`.
You can have multiple `pip_parse`s in the same workspace. Or use the pip
extension multiple times when using bzlmod. This configuration will create
multiple external repos that have no relation to one another and may result in
downloading the same wheels numerous times.
As with any repository rule, if you would like to ensure that `pip_parse` is
re-executed to pick up a non-hermetic change to your environment (e.g., updating
your system `python` interpreter), you can force it to re-execute by running
`bazel sync --only [pip_parse name]`.
{#using-third-party-packages}
## Using third party packages as dependencies
Each extracted wheel repo contains a `py_library` target representing
the wheel's contents. There are two ways to access this library. The
first uses the `requirement()` function defined in the central
repo's `//:requirements.bzl` file. This function maps a pip package
name to a label:
```starlark
load("@my_deps//:requirements.bzl", "requirement")
py_library(
name = "mylib",
srcs = ["mylib.py"],
deps = [
":myotherlib",
requirement("some_pip_dep"),
requirement("another_pip_dep"),
]
)
```
The reason `requirement()` exists is to insulate from
changes to the underlying repository and label strings. However, those
labels have become directly used, so aren't able to easily change regardless.
On the other hand, using `requirement()` has several drawbacks; see
[this issue][requirements-drawbacks] for an enumeration. If you don't
want to use `requirement()`, you can use the library
labels directly instead. For `pip_parse`, the labels are of the following form:
```starlark
@{name}_{package}//:pkg
```
Here `name` is the `name` attribute that was passed to `pip_parse` and
`package` is the pip package name with characters that are illegal in
Bazel label names (e.g. `-`, `.`) replaced with `_`. If you need to
update `name` from "old" to "new", then you can run the following
buildozer command:
```shell
buildozer 'substitute deps @old_([^/]+)//:pkg @new_${1}//:pkg' //...:*
```
[requirements-drawbacks]: https://github.com/bazelbuild/rules_python/issues/414
### 'Extras' dependencies
Any 'extras' specified in the requirements lock file will be automatically added
as transitive dependencies of the package. In the example above, you'd just put
`requirement("useful_dep")`.
### Packaging cycles
Sometimes PyPi packages contain dependency cycles -- for instance `sphinx`
depends on `sphinxcontrib-serializinghtml`. When using them as `requirement()`s,
ala
```
py_binary(
name = "doctool",
...
deps = [
requirement("sphinx"),
]
)
```
Bazel will protest because it doesn't support cycles in the build graph --
```
ERROR: .../external/pypi_sphinxcontrib_serializinghtml/BUILD.bazel:44:6: in alias rule @pypi_sphinxcontrib_serializinghtml//:pkg: cycle in dependency graph:
//:doctool (...)
@pypi//sphinxcontrib_serializinghtml:pkg (...)
.-> @pypi_sphinxcontrib_serializinghtml//:pkg (...)
| @pypi_sphinxcontrib_serializinghtml//:_pkg (...)
| @pypi_sphinx//:pkg (...)
| @pypi_sphinx//:_pkg (...)
`-- @pypi_sphinxcontrib_serializinghtml//:pkg (...)
```
The `experimental_requirement_cycles` argument allows you to work around these
issues by specifying groups of packages which form cycles. `pip_parse` will
transparently fix the cycles for you and provide the cyclic dependencies
simultaneously.
```
pip_parse(
...
experimental_requirement_cycles = {
"sphinx": [
"sphinx",
"sphinxcontrib-serializinghtml",
]
},
)
```
`pip_parse` supports fixing multiple cycles simultaneously, however cycles must
be distinct. `apache-airflow` for instance has dependency cycles with a number
of its optional dependencies, which means those optional dependencies must all
be a part of the `airflow` cycle. For instance --
```
pip_parse(
...
experimental_requirement_cycles = {
"airflow": [
"apache-airflow",
"apache-airflow-providers-common-sql",
"apache-airflow-providers-postgres",
"apache-airflow-providers-sqlite",
]
}
)
```
Alternatively, one could resolve the cycle by removing one leg of it.
For example while `apache-airflow-providers-sqlite` is "baked into" the Airflow
package, `apache-airflow-providers-postgres` is not and is an optional feature.
Rather than listing `apache-airflow[postgres]` in your `requirements.txt` which
would expose a cycle via the extra, one could either _manually_ depend on
`apache-airflow` and `apache-airflow-providers-postgres` separately as
requirements. Bazel rules which need only `apache-airflow` can take it as a
dependency, and rules which explicitly want to mix in
`apache-airflow-providers-postgres` now can.
Alternatively, one could use `rules_python`'s patching features to remove one
leg of the dependency manually. For instance by making
`apache-airflow-providers-postgres` not explicitly depend on `apache-airflow` or
perhaps `apache-airflow-providers-common-sql`.
## Consuming Wheel Dists Directly
If you need to depend on the wheel dists themselves, for instance, to pass them
to some other packaging tool, you can get a handle to them with the
`whl_requirement` macro. For example:
```starlark
filegroup(
name = "whl_files",
data = [
whl_requirement("boto3"),
]
)
```