fix: pypi parse_simpleapi_html.bzl is robust to metadata containing ">" sign (#2031)
This PR modifies the logic for finding the end of the of the <a>
tag metadata attributes in the pypi `parse_simpleapi_html` function.
This was discovered after investigation of the following error:
```
Error in repository_rule: invalid user-provided repo name 'pypi_311_=2_7,!=3_0_*,!=3_1_*,!=3_2_*">six_py2_none_any_8abb2f1d': valid names may contain only A-Z, a-z, 0-9, '-', '_', '.', and must start with a letter
```
which was traced back to the to a `data-requires-python` attribute
containing a `>` sign (instead of `>`) in the Azure Artifacts pypi
feed, e.g.:
`<a
href="https://microsoft.pkgs.visualstudio.com/REDACTED_URL/six-1.16.0-py2.py3-none-any.whl#sha256=8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254"
data-requires-python=">=2.7,!=3.0.*,!=3.1.*,!=3.2.*">six-1.16.0-py2.py3-none-any.whl</a><br/>`
---------
Co-authored-by: Mihai Dusmanu <mihaidusmanu@microsoft.com>
Co-authored-by: aignas <240938+aignas@users.noreply.github.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 61df086..ac11e14 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -25,6 +25,24 @@
[x.x.x]: https://github.com/bazelbuild/rules_python/releases/tag/x.x.x
### Changed
+* Nothing yet
+
+### Fixed
+* (rules) Fixes python builds when the `--build_python_zip` is set to `false` on Windows. See [#1840](https://github.com/bazelbuild/rules_python/issues/1840).
+* (pip) Fixed pypi parse_simpleapi_html function for feeds with package metadata
+ containing ">" sign
+
+### Added
+* Nothing yet
+
+### Removed
+* Nothing yet
+
+## [0.34.0] - 2024-07-04
+
+[0.34.0]: https://github.com/bazelbuild/rules_python/releases/tag/0.34.0
+
+### Changed
* `protobuf`/`com_google_protobuf` dependency bumped to `v24.4`
* (bzlmod): optimize the creation of config settings used in pip to
reduce the total number of targets in the hub repo.
@@ -49,7 +67,8 @@
"@platforms//os:linux": ["@pypi//foo_available_only_on_linux"],
"//conditions:default": [],
}
- )`.
+ )
+ ```
* (bzlmod): Targets in `all_requirements` now use the same form as targets returned by the `requirement` macro.
* (rules) Auto exec groups are enabled. This allows actions run by the rules,
such as precompiling, to pick an execution platform separately from what
@@ -67,8 +86,6 @@
### Added
* (toolchains) {obj}`//python/runtime_env_toolchains:all`, which is a drop-in
replacement for the "autodetecting" toolchain.
-
-### Added
* (gazelle) Added new `python_label_convention` and `python_label_normalization` directives. These directive
allows altering default Gazelle label format to third-party dependencies useful for re-using Gazelle plugin
with other rules, including `rules_pycross`. See [#1939](https://github.com/bazelbuild/rules_python/issues/1939).
diff --git a/python/private/pypi/parse_simpleapi_html.bzl b/python/private/pypi/parse_simpleapi_html.bzl
index f7cd032..2488469 100644
--- a/python/private/pypi/parse_simpleapi_html.bzl
+++ b/python/private/pypi/parse_simpleapi_html.bzl
@@ -49,6 +49,8 @@
# https://packaging.python.org/en/latest/specifications/simple-repository-api/#versioning-pypi-s-simple-api
fail("Unsupported API version: {}".format(api_version))
+ # Each line follows the following pattern
+ # <a href="https://...#sha256=..." attribute1="foo" ... attributeN="bar">filename</a><br />
for line in lines[1:]:
dist_url, _, tail = line.partition("#sha256=")
sha256, _, tail = tail.partition("\"")
@@ -56,8 +58,8 @@
# See https://packaging.python.org/en/latest/specifications/simple-repository-api/#adding-yank-support-to-the-simple-api
yanked = "data-yanked" in line
- maybe_metadata, _, tail = tail.partition(">")
- filename, _, tail = tail.partition("<")
+ head, _, _ = tail.rpartition("</a>")
+ maybe_metadata, _, filename = head.rpartition(">")
metadata_sha256 = ""
metadata_url = ""
diff --git a/tests/pypi/parse_simpleapi_html/parse_simpleapi_html_tests.bzl b/tests/pypi/parse_simpleapi_html/parse_simpleapi_html_tests.bzl
index a60bb1f..a532e87 100644
--- a/tests/pypi/parse_simpleapi_html/parse_simpleapi_html_tests.bzl
+++ b/tests/pypi/parse_simpleapi_html/parse_simpleapi_html_tests.bzl
@@ -61,6 +61,22 @@
yanked = False,
),
),
+ (
+ struct(
+ attrs = [
+ 'href="https://example.org/full-url/foo-0.0.1.tar.gz#sha256=deadbeefasource"',
+ 'data-requires-python=">=3.7"',
+ ],
+ filename = "foo-0.0.1.tar.gz",
+ url = "ignored",
+ ),
+ struct(
+ filename = "foo-0.0.1.tar.gz",
+ sha256 = "deadbeefasource",
+ url = "https://example.org/full-url/foo-0.0.1.tar.gz",
+ yanked = False,
+ ),
+ ),
]
for (input, want) in tests:
@@ -114,6 +130,26 @@
struct(
attrs = [
'href="https://example.org/full-url/foo-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=deadbeef"',
+ 'data-requires-python=">=3.7"',
+ 'data-dist-info-metadata="sha256=deadb00f"',
+ 'data-core-metadata="sha256=deadb00f"',
+ ],
+ filename = "foo-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
+ url = "ignored",
+ ),
+ struct(
+ filename = "foo-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
+ metadata_sha256 = "deadb00f",
+ metadata_url = "https://example.org/full-url/foo-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata",
+ sha256 = "deadbeef",
+ url = "https://example.org/full-url/foo-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
+ yanked = False,
+ ),
+ ),
+ (
+ struct(
+ attrs = [
+ 'href="https://example.org/full-url/foo-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=deadbeef"',
'data-requires-python=">=3.7"',
'data-core-metadata="sha256=deadb00f"',
],