diff --git a/purldb-toolkit/README.rst b/purldb-toolkit/README.rst index cb70a999..7c85e360 100644 --- a/purldb-toolkit/README.rst +++ b/purldb-toolkit/README.rst @@ -1,102 +1,649 @@ purldb-toolkit ============== -purldb-toolkit is command line utility and library to use the PurlDB, its API and various related libraries. +.. contents:: :local: + :depth: 3 -The ``purlcli`` command acts as a client to the PurlDB REST API end point(s) to expose PURL services. -It serves both as a tool, as a library and as an example on how to use the services programmatically. +purldb-toolkit is a command line utility and library to use the PurlDB, its API and various related libraries. + +purldb-toolkit exposes the ``purlcli`` command that acts as a client to the PURL libraries, +the PurlDB and MatchCode REST API and exposes various PURL-based services. + +purldb-toolkit serves as a tool, a library and an example of how to use the services programmatically. Installation ------------ +.. code-block:: console + pip install purldb-toolkit Usage ----- -Use this command to get basic help:: - - $ purlcli --help - Usage: purlcli [OPTIONS] COMMAND [ARGS]... +The purlcli command exposes multiple subcommands. Run this to command to get basic help: - Return information from a PURL. +.. code-block:: console + purlcli --help + Usage: purlcli [OPTIONS] COMMAND [ARGS]... + + Return information for a PURL or list of PURLs. + Options: --help Show this message and exit. - + Commands: - metadata Given one or more PURLs, for each PURL, return a mapping of... - urls Given one or more PURLs, for each PURL, return a list of all... - validate Check the syntax of one or more PURLs. - versions Given one or more PURLs, return a list of all known versions... + d2d Run deploy-to-devel "back2source" analysis between packages. + metadata Fetch package metadata for a PURL. + urls Return known URLs for a PURL. + validate Validate PURL syntax and existence. + versions List all known versions for a PURL. + -And the following subcommands: +The purlcli exposes the following subcommands: -- Validate a PURL:: +- validate Validate PURL syntax. +- metadata Fetch package metadata for a PURL. +- urls Return known URLs for a PURL. +- versions List all known versions for a PURL. +- d2d Run deploy-to-devel between packages. - $ purlcli validate --help - Usage: purlcli validate [OPTIONS] - Check the syntax of one or more PURLs. +Each subcommand use the same set of options:: Options: - --purl TEXT PackageURL or PURL. - --output FILENAME Write validation output as JSON to FILE. [required] - --file FILENAME Read a list of PURLs from a FILE, one per line. - --help Show this message and exit. + --purl PURL Package-URL or PURL. + --output FILE Write output as JSON to FILE. Default is to print on screen. + [default: -] + --file FILE Read a list of PURLs from a FILE, one per line. + --help Show this message and exit. -- Collect package versions for a PURL:: +``validate``: validate a PURL +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - $ purlcli versions --help - Usage: purlcli versions [OPTIONS] +This command validates a PURL in two ways: - Given one or more PURLs, return a list of all known versions for each PURL. +* validate that the PURL syntax is correct +* validate that the PURL points to an existing package querying the PurlDB and upstream + package registries as needed. - Version information is not needed in submitted PURLs and if included will be - removed before processing. - Options: - --purl TEXT PackageURL or PURL. - --output FILENAME Write versions output as JSON to FILE. [required] - --file FILENAME Read a list of PURLs from a FILE, one per line. - --help Show this message and exit. +Examples +######## +**Submit multiple PURLs using the command line:** -- Collect package metadata for a PURL:: +.. code-block:: console - $ purlcli metadata --help - Usage: purlcli metadata [OPTIONS] + purlcli validate --purl pkg:npm/canonical-path@1.0.0 --purl pkg:nginx/nginx@0.8.9 --output - Given one or more PURLs, for each PURL, return a mapping of metadata fetched - from the fetchcode package.py info() function. +*Sample output:* - Options: - --purl TEXT PackageURL or PURL. - --output FILENAME Write meta output as JSON to FILE. [required] - --file FILENAME Read a list of PURLs from a FILE, one per line. - --unique Return data only for unique PURLs. - --help Show this message and exit. +.. code-block:: json + { + "headers": [ + { + "tool_name": "purlcli", + "tool_version": "0.2.0", + "options": { + "command": "validate", + "--purl": [ + "pkg:npm/canonical-path@1.0.0", + "pkg:nginx/nginx@0.8.9" + ], + "--file": null, + "--output": "" + }, + "errors": [], + "warnings": [ + "'check_existence' is not supported for 'pkg:nginx/nginx@0.8.9'" + ] + } + ], + "packages": [ + { + "purl": "pkg:npm/canonical-path@1.0.0", + "valid": true, + "exists": true, + "message": "The provided Package URL is valid, and the package exists in the upstream repo." + }, + { + "purl": "pkg:nginx/nginx@0.8.9", + "valid": true, + "exists": null, + "message": "The provided PackageURL is valid, but `check_existence` is not supported for this package type." + } + ] + } -- Collect package URLs for a PURL:: - $ purlcli urls --help - Usage: purlcli urls [OPTIONS] +**Submit multiple PURLs using a .txt file:** - Given one or more PURLs, for each PURL, return a list of all known URLs - fetched from the packageurl-python purl2url.py code. +.. code-block:: console - Options: - --purl TEXT PackageURL or PURL. - --output FILENAME Write urls output as JSON to FILE. [required] - --file FILENAME Read a list of PURLs from a FILE, one per line. - --unique Return data only for unique PURLs. - --head Validate each URL's existence with a head request. - --help Show this message and exit. + purlcli validate --file --output + +*Sample input.txt:* + +.. code-block:: text + + pkg:npm/canonical-path@1.0.0 + pkg:nginx/nginx@0.8.9 + + +Details +####### + +``validate`` calls the ``validate/`` endpoint of the `purldb API `_. + +See also https://public.purldb.io/api/docs/#/validate for details. + + +---- + + +``versions``: collect package versions for a PURL +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This command collects and return a list of all the known versions of a PURL by querying the PurlDB +and upstream package registries as needed. + + +This command collects and return a list of all the known versions of a PURL by querying the PurlDB +and upstream package registries as needed. + + +Examples +######## + +**Submit multiple PURLs using the command line:** + +.. code-block:: console + + purlcli versions --purl pkg:npm/canonical-path --purl pkg:nginx/nginx --output + +*Sample output:* + +.. code-block:: json + + { + "headers": [ + { + "tool_name": "purlcli", + "tool_version": "0.2.0", + "options": { + "command": "versions", + "--purl": [ + "pkg:npm/canonical-path", + "pkg:nginx/nginx" + ], + "--file": null, + "--output": "" + }, + "errors": [], + "warnings": [ + "'pkg:nginx/nginx' not supported with `versions` command" + ] + } + ], + "packages": [ + { + "purl": "pkg:npm/canonical-path@0.0.1", + "version": "0.0.1", + "release_date": "2013-12-19" + }, + { + "purl": "pkg:npm/canonical-path@0.0.2", + "version": "0.0.2", + "release_date": "2013-12-19" + }, + { + "purl": "pkg:npm/canonical-path@1.0.0", + "version": "1.0.0", + "release_date": "2018-10-24" + } + ] + } + + +Details +####### + +``versions`` calls ``versions()`` from `fetchcode/package_versions.py`. + +Version information is not needed in submitted PURLs and, if included, will be removed before processing. + + +---- + + +``metadata``: collect package metadata for a PURL +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This command collects and return the metadata for the package pointed by a PURL. It does so by +querying the PurlDB and upstream package registries as needed. + +The metadata consist of all available information found in the package manifest and package registry +or repository API. + +The schema is the schema used by ScanCode Toolkit, PurlDB and all other AboutCode projects. + +Examples +######## + +**Submit multiple PURLs using the command line:** + +.. code-block:: console + + purlcli metadata --purl pkg:openssl/openssl@3.0.6 --purl pkg:nginx/nginx@0.8.9 --purl pkg:gnu/glibc@2.38 --output + +*Sample output:* + +.. code-block:: json + + { + "headers": [ + { + "tool_name": "purlcli", + "tool_version": "0.2.0", + "options": { + "command": "metadata", + "--purl": [ + "pkg:openssl/openssl@3.0.6", + "pkg:nginx/nginx@0.8.9", + "pkg:gnu/glibc@2.38" + ], + "--file": null, + "--output": "" + }, + "errors": [], + "warnings": [ + "'check_existence' is not supported for 'pkg:openssl/openssl@3.0.6'", + "'pkg:nginx/nginx@0.8.9' not supported with `metadata` command", + "'check_existence' is not supported for 'pkg:gnu/glibc@2.38'" + ] + } + ], + "packages": [ + { + "purl": "pkg:openssl/openssl@3.0.6", + "type": "openssl", + "namespace": null, + "name": "openssl", + "version": "3.0.6", + "qualifiers": {}, + "subpath": null, + "primary_language": "C", + "description": null, + "release_date": "2022-10-11T12:39:09", + "parties": [], + "keywords": [], + "homepage_url": "https://www.openssl.org", + "download_url": "https://github.com/openssl/openssl/archive/refs/tags/openssl-3.0.6.tar.gz", + "api_url": "https://api.github.com/repos/openssl/openssl", + "size": null, + "sha1": null, + "md5": null, + "sha256": null, + "sha512": null, + "bug_tracking_url": "https://github.com/openssl/openssl/issues", + "code_view_url": "https://github.com/openssl/openssl", + "vcs_url": "git://github.com/openssl/openssl.git", + "copyright": null, + "license_expression": null, + "declared_license": "Apache-2.0", + "notice_text": null, + "root_path": null, + "dependencies": [], + "contains_source_code": null, + "source_packages": [], + "repository_homepage_url": null, + "repository_download_url": null, + "api_data_url": null + }, + { + "purl": "pkg:gnu/glibc@2.38", + "type": "gnu", + "namespace": null, + "name": "glibc", + "version": "2.38", + "qualifiers": {}, + "subpath": null, + "primary_language": null, + "description": null, + "release_date": "2023-07-31T17:34:00", + "parties": [], + "keywords": [], + "homepage_url": "https://ftp.gnu.org/pub/gnu/glibc/", + "download_url": "https://ftp.gnu.org/pub/gnu/glibc/glibc-2.38.tar.gz", + "api_url": null, + "size": null, + "sha1": null, + "md5": null, + "sha256": null, + "sha512": null, + "bug_tracking_url": null, + "code_view_url": null, + "vcs_url": null, + "copyright": null, + "license_expression": null, + "declared_license": null, + "notice_text": null, + "root_path": null, + "dependencies": [], + "contains_source_code": null, + "source_packages": [], + "repository_homepage_url": null, + "repository_download_url": null, + "api_data_url": null + } + ] + } + + +Details +####### + +``metadata`` calls ``info()`` from `fetchcode/package.py`. + +The intended output for each PURL type supported by the ``metadata`` command is: + +- an input PURL with a version: output the metadata for the input version +- an input PURL without a version: output a list of the metadata for all versions + + +---- + + +``urls``: collect package URLs for a PURL +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This command collects and return the known URL for a PURL. It does so by based on package type/ecosystem +conventions. It optionally also checks if the inferred URLs exists on the web. + +Examples +######## + +**Submit multiple PURLs using the command line:** + +.. code-block:: console + + purlcli urls --purl pkg:npm/canonical-path@1.0.0 --purl pkg:nginx/nginx@0.8.9 --purl pkg:rubygems/rails@7.0.0 --output + +*Sample output:* + +.. code-block:: json + + { + "headers": [ + { + "tool_name": "purlcli", + "tool_version": "0.2.0", + "options": { + "command": "urls", + "--purl": [ + "pkg:npm/canonical-path@1.0.0", + "pkg:nginx/nginx@0.8.9", + "pkg:rubygems/rails@7.0.0" + ], + "--file": null, + "--output": "" + }, + "errors": [], + "warnings": [ + "'pkg:nginx/nginx@0.8.9' not supported with `urls` command", + "'check_existence' is not supported for 'pkg:rubygems/rails@7.0.0'" + ] + } + ], + "packages": [ + { + "purl": "pkg:npm/canonical-path@1.0.0", + "download_url": "http://registry.npmjs.org/canonical-path/-/canonical-path-1.0.0.tgz", + "inferred_urls": [ + "https://www.npmjs.com/package/canonical-path/v/1.0.0", + "http://registry.npmjs.org/canonical-path/-/canonical-path-1.0.0.tgz" + ], + "repository_download_url": null, + "repository_homepage_url": "https://www.npmjs.com/package/canonical-path/v/1.0.0" + }, + { + "purl": "pkg:rubygems/rails@7.0.0", + "download_url": "https://rubygems.org/downloads/rails-7.0.0.gem", + "inferred_urls": [ + "https://rubygems.org/gems/rails/versions/7.0.0", + "https://rubygems.org/downloads/rails-7.0.0.gem" + ], + "repository_download_url": null, + "repository_homepage_url": "https://rubygems.org/gems/rails/versions/7.0.0" + } + ] + } + + +**Include head and get requests:** + +``--head`` + +.. code-block:: console + + purlcli urls --purl pkg:npm/canonical-path@1.0.0 --purl pkg:nginx/nginx@0.8.9 --purl pkg:rubygems/rails@7.0.0 --output --head + +*Sample output:* + +.. code-block:: json + + { + "headers": [ + { + "tool_name": "purlcli", + "tool_version": "0.2.0", + "options": { + "command": "urls", + "--purl": [ + "pkg:npm/canonical-path@1.0.0", + "pkg:nginx/nginx@0.8.9", + "pkg:rubygems/rails@7.0.0" + ], + "--file": null, + "--head": true, + "--output": "" + }, + "errors": [], + "warnings": [ + "'pkg:nginx/nginx@0.8.9' not supported with `urls` command", + "'check_existence' is not supported for 'pkg:rubygems/rails@7.0.0'" + ] + } + ], + "packages": [ + { + "purl": "pkg:npm/canonical-path@1.0.0", + "download_url": { + "url": "http://registry.npmjs.org/canonical-path/-/canonical-path-1.0.0.tgz", + "get_request_status_code": 200, + "head_request_status_code": 301 + }, + "inferred_urls": [ + { + "url": "https://www.npmjs.com/package/canonical-path/v/1.0.0", + "get_request_status_code": 200, + "head_request_status_code": 200 + }, + { + "url": "http://registry.npmjs.org/canonical-path/-/canonical-path-1.0.0.tgz", + "get_request_status_code": 200, + "head_request_status_code": 301 + } + ], + "repository_download_url": { + "url": null, + "get_request_status_code": "N/A", + "head_request_status_code": "N/A" + }, + "repository_homepage_url": { + "url": "https://www.npmjs.com/package/canonical-path/v/1.0.0", + "get_request_status_code": 200, + "head_request_status_code": 200 + } + }, + { + "purl": "pkg:rubygems/rails@7.0.0", + "download_url": { + "url": "https://rubygems.org/downloads/rails-7.0.0.gem", + "get_request_status_code": 200, + "head_request_status_code": 200 + }, + "inferred_urls": [ + { + "url": "https://rubygems.org/gems/rails/versions/7.0.0", + "get_request_status_code": 200, + "head_request_status_code": 200 + }, + { + "url": "https://rubygems.org/downloads/rails-7.0.0.gem", + "get_request_status_code": 200, + "head_request_status_code": 200 + } + ], + "repository_download_url": { + "url": null, + "get_request_status_code": "N/A", + "head_request_status_code": "N/A" + }, + "repository_homepage_url": { + "url": "https://rubygems.org/gems/rails/versions/7.0.0", + "get_request_status_code": 200, + "head_request_status_code": 200 + } + } + ] + } + + + + +``d2d``: Run a deployed code to development code analysis +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This command runs deploy-to-devel aka. "back2source" analysis between packages. + +Its behavior depends on the number of --purl options and their values. + +- With a single PURL, run the deploy-to-devel between all the PURLs of the set of PURLs that + this PURL belongs to. + +- With two PURLs, run the deploy-to-devel between these two PURLs. The first is the "from" PURL, + and the second is the "to" PURL. The first or "from" PURL is typically the source code or version + control checkout. The second or "to" PURL is the target of a build or transformnation such as a + binary, or a source archive. + +- You can also provide two HTTP URLs instead of PURLs and use these as direct download URLs. + +This command waits for the run to complete and save results to the `output` FILE. + + + +Examples +######## + +**Run a d2d analysis between two Java JARs (source and binary)** + +You first need to install and run matchcode locally so you have the endpoint accessible. Starting +from a https://github.com/nexB/purldb/ clone:: + + git clone https://github.com/nexB/purldb + cd purldb + make dev + make envfile + SECRET_KEY="1" make postgres_matchcodeio + SECRET_KEY="1" make run_matchcodeio + +Then in another shell:: + + cd purldb + source venv/bin/activate + +Finally run the command: + +.. code-block:: console + + purlcli d2d \ + --purl https://repo1.maven.org/maven2/org/apache/htrace/htrace-core/4.0.0-incubating/htrace-core-4.0.0-incubating-sources.jar \ + --purl https://repo1.maven.org/maven2/org/apache/htrace/htrace-core/4.0.0-incubating/htrace-core-4.0.0-incubating.jar \ + --matchcode-api-url http://127.0.0.1:8002/api/ + +*Sample output:* + +Here you can see that there are over 730 resources that require review and that may be present in the +binary and not present in the sources. + +.. code-block:: json + + { + "url": "http://127.0.0.1:8002/api/d2d/5d9dbcca-48f0-4788-a356-29196f785c52/", + "uuid": "5d9dbcca-48f0-4788-a356-29196f785c52", + "created_date": "2024-06-04T16:31:24.879808Z", + "input_sources": [ + { + "uuid": "6b459edd-6b8b-473a-add7-cc79152b4d5e", + "filename": "htrace-core-4.0.0-incubating-sources.jar", + "download_url": "https://repo1.maven.org/maven2/org/apache/htrace/htrace-core/4.0.0-incubating/htrace-core-4.0.0-incubating-sources.jar#from", + "is_uploaded": false, + "tag": "from", + "size": 42766, + "is_file": true, + "exists": true + }, + { + "uuid": "bb811a08-ea8c-46b4-8720-865f068ecc0d", + "filename": "htrace-core-4.0.0-incubating.jar", + "download_url": "https://repo1.maven.org/maven2/org/apache/htrace/htrace-core/4.0.0-incubating/htrace-core-4.0.0-incubating.jar#to", + "is_uploaded": false, + "tag": "to", + "size": 1485031, + "is_file": true, + "exists": true + } + ], + "runs": [ + "8689ba05-3859-4eab-b2cf-9bec1495629f" + ], + "resource_count": 849, + "package_count": 1, + "dependency_count": 0, + "relation_count": 37, + "codebase_resources_summary": { + "ignored-directory": 56, + "mapped": 37, + "not-deployed": 1, + "requires-review": 730, + "scanned": 25 + }, + "discovered_packages_summary": { + "total": 1, + "with_missing_resources": 0, + "with_modified_resources": 0 + }, + "discovered_dependencies_summary": { + "total": 0, + "is_runtime": 0, + "is_optional": 0, + "is_resolved": 0 + }, + "codebase_relations_summary": { + "java_to_class": 34, + "sha1": 3 + }, + "codebase_resources_discrepancies": { + "total": 730 + } + } Funding