Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a rules_license PackageInfo from imported dependencies #2054

Open
shs96c opened this issue Jul 10, 2024 · 12 comments
Open

Expose a rules_license PackageInfo from imported dependencies #2054

shs96c opened this issue Jul 10, 2024 · 12 comments

Comments

@shs96c
Copy link
Contributor

shs96c commented Jul 10, 2024

pip_parse allows us to import third party dependencies, but the imports lack enough information for us to generate an SBOM. It would be useful if targets imported from third party python deps were annotated with a PackageInfo from rules_license (notably, the purl is incredibly useful for generating CycloneDX format SBOMs).

While it might be possible to add this information in a custom way, adopting rules_license allows SBOMs to be generated without adding special logic to each ruleset.

@groodt
Copy link
Collaborator

groodt commented Jul 10, 2024

I've been keeping an eye on this too. It will likely be easier for Python when Licensing is standardized in PEP 639. It will support SPDX expressions as part of Core Metadata 2.4

@aignas
Copy link
Collaborator

aignas commented Jul 10, 2024

+1 for this getting added to rules_python. If anyone wants to take a stab at it, I can answer questions about pip machinery and help in this way.

@rickeylev
Copy link
Collaborator

Assuming we had the license information, does this boil down to adding load(<rules_license>, "licenses"); licenses(...) to the pip-generated BUILD file?

@arrdem
Copy link
Contributor

arrdem commented Jul 10, 2024

And being able to determine which license is appropriate from package metadata -- I think so. Was thinking about whether this would be a useful addition a bit back and the only constraint I can think of is rules_license stability. We aren't using SBOMs today and I've got my own automation that looks at package metadata for say prohibiting GPL licenses but this feature feels worthwhile.

@rickeylev
Copy link
Collaborator

Yeah, +1 to the overall feature. It should be easy to add the loads for rules_license.

From what groodt said, it sounds like hard part will be getting the license info from whatever artifact was downloaded from pypi?

@groodt
Copy link
Collaborator

groodt commented Jul 11, 2024

From what groodt said, it sounds like hard part will be getting the license info from whatever artifact was downloaded from pypi?

Yes. It's quite messy at the moment. You can grab some license info but it's messy and all over the place. From the PEP.

"""
This has triggered a number of license-related discussions and issues, including on outdated and ambiguous PyPI classifiers, license interoperability with other ecosystems, too many confusing license metadata options, limited support for license files in the Wheel project, and the lack of precise license metadata.

As a result, on average, Python packages tend to have more ambiguous and missing license information than other common ecosystems.
"""

As is typical for Python, due to it's age, a lot of it is messier than some of the other language ecosystems. I think the PEP is going to be accepted though.

@shs96c
Copy link
Contributor Author

shs96c commented Jul 11, 2024

This request is the for PackageInfo, which doesn't need the license at all, even though the provider comes from rules_license. Hopefully we can expose the PackageInfo before we need to expose the rest of the information?

@groodt
Copy link
Collaborator

groodt commented Jul 12, 2024

Ah, got it. Is there an example from other rules? My understanding of purl spec is that you can easily build it from the discrete components.

pkg:pypi/[email protected]

https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

It's unclear to me what should happen if a mirror or local/fork of a package is used instead of a canonical package from pypi for example.

@shs96c
Copy link
Contributor Author

shs96c commented Jul 12, 2024

The purl is easy to construct, and all the required information should be available to the repo rules. rules_rust add PackageInfo to targets it generates, but it's somewhat obfuscated: https://github.com/bazelbuild/rules_rust/blob/c177ccc1a75b11badd984c01e51e61c840c572d8/crate_universe/src/rendering.rs#L347

I plan on adding this to rules_jvm_external soon too.

@groodt
Copy link
Collaborator

groodt commented Jul 12, 2024

Is the prefix for python packages always pkg:pypi/?

It seems the only necessary metadata to be added here are name and version?

Is py_library the appropriate place for the metadata? Does it only apply to dependencies fetched from an index? What about vendored libraries?

@rickeylev
Copy link
Collaborator

Is py_library the appropriate place for the metadata?

I don't think so. From what I understand, the way the license stuff works is you specify a package-level value, e.g. package(default_applicable_licenses=[":license"])[1], and the :license target has various license info. Targets in the package automatically inherit the settings.

[1] Though I swear I thought they changed this name to something like "default_metadata" or something

@shs96c
Copy link
Contributor Author

shs96c commented Jul 13, 2024

When constructing an SBOM, having the PackageInfo be on the python_library would be incredibly helpful, as it would avoid the need to use an aspect to go and tie the PackageInfo to the library.

You can use a add additional information to a purl to specify things like the repository_url and checksum, both of which can be handy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants