Skip to content

Commit

Permalink
feat: enable repo finder to support more languages via Open Source In…
Browse files Browse the repository at this point in the history
…sights (#388)

This feature modifies the Repo Finder, so that it can: be usable from anywhere within Macaron; accept PURL strings as input; and, support more languages via Google's Open Source Insights (deps.dev)

This enables Macaron to accept artifact PURLs as input, whereby the Repo Finder will be used to attempt to retrieve the related repository. 

Additional languages include those supported by deps.dev: Python, NodeJS, .Net, and Rust. Note that currently these will only work when specifying an artifact PURL as input, or providing an SBOM. Full support for these extra languages will require the addition of new dependency analyzers.

A new config option is also provided to disable API calls to Google's Open Source Insights, if desired.

Signed-off-by: Ben Selwyn-Smith <[email protected]>
  • Loading branch information
benmss authored Sep 21, 2023
1 parent 7350b55 commit bf118b3
Show file tree
Hide file tree
Showing 30 changed files with 2,785 additions and 1,418 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,3 @@ macaron.dependency\_analyzer.dependency\_resolver module
:members:
:undoc-members:
:show-inheritance:

macaron.dependency\_analyzer.java\_repo\_finder module
------------------------------------------------------

.. automodule:: macaron.dependency_analyzer.java_repo_finder
:members:
:undoc-members:
:show-inheritance:
50 changes: 50 additions & 0 deletions docs/source/pages/developers_guide/apidoc/macaron.repo_finder.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
macaron.repo\_finder package
============================

.. automodule:: macaron.repo_finder
:members:
:undoc-members:
:show-inheritance:

Submodules
----------

macaron.repo\_finder.repo\_finder module
----------------------------------------

.. automodule:: macaron.repo_finder.repo_finder
:members:
:undoc-members:
:show-inheritance:

macaron.repo\_finder.repo\_finder\_base module
----------------------------------------------

.. automodule:: macaron.repo_finder.repo_finder_base
:members:
:undoc-members:
:show-inheritance:

macaron.repo\_finder.repo\_finder\_deps\_dev module
---------------------------------------------------

.. automodule:: macaron.repo_finder.repo_finder_deps_dev
:members:
:undoc-members:
:show-inheritance:

macaron.repo\_finder.repo\_finder\_java module
----------------------------------------------

.. automodule:: macaron.repo_finder.repo_finder_java
:members:
:undoc-members:
:show-inheritance:

macaron.repo\_finder.repo\_validator module
-------------------------------------------

.. automodule:: macaron.repo_finder.repo_validator
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/pages/developers_guide/apidoc/macaron.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Subpackages
macaron.output_reporter
macaron.parsers
macaron.policy_engine
macaron.repo_finder
macaron.slsa_analyzer

Submodules
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ macaron.slsa\_analyzer.build\_tool.base\_build\_tool module
:undoc-members:
:show-inheritance:

macaron.slsa\_analyzer.build\_tool.docker module
------------------------------------------------

.. automodule:: macaron.slsa_analyzer.build_tool.docker
:members:
:undoc-members:
:show-inheritance:

macaron.slsa\_analyzer.build\_tool.gradle module
------------------------------------------------

Expand Down
57 changes: 50 additions & 7 deletions docs/source/pages/using.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ To simplify the examples, we use the same configurations as above if needed (e.g
The list bellow shows examples for the corresponding PURL strings for different git repositories:

.. list-table:: Example of PURL strings for git repositories.
.. list-table:: Examples of PURL strings for git repositories.
:widths: 50 50
:header-rows: 1

Expand Down Expand Up @@ -133,6 +133,39 @@ You can also provide the PURL string together with the repository path. In this
.. note:: When providing the PURL and the repository path, both the branch name and commit digest must be provided as well.

''''''''''''''''''''''''''''''''''''''
Providing an artifact as a PURL string
''''''''''''''''''''''''''''''''''''''

The PURL format supports artifacts as well as repositories, and Macaron supports (some of) these too.

.. code-block::
pkg:<package_type>/<artifact_details>
Where ``artifact_details`` varies based on the provided ``package_type``. Examples for those currently supported by Macaron are as follows:

.. list-table:: Examples of PURL strings for artifacts.
:widths: 50 50
:header-rows: 1

* - Package Type
- PURL String
* - Maven (Java)
- ``pkg:maven/org.apache.xmlgraphics/[email protected]``
* - PyPi (Python)
- ``pkg:pypi/[email protected]``
* - Cargo (Rust)
- ``pkg:cargo/[email protected]``
* - NuGet (.Net)
- ``pkg:nuget/[email protected]``
* - NPM (NodeJS)
- ``pkg:npm/%40angular/[email protected]``

For more detailed information on converting a given artifact into a PURL, see `PURL Specification <https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst>`_ and `PURL Types <https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst>`_

.. note:: If a repository is not also provided, Macaron will try to discover it based on the artifact purl. For this to work, ``find_repos`` in the configuration file **must be enabled**\. See `Analyzing more dependencies <#more-deps>`_ for more information about the configuration options of the Repository Finding feature.

-------------------------------------------------
Verifying provenance expectations in CUE language
-------------------------------------------------
Expand Down Expand Up @@ -191,6 +224,8 @@ With the example above, the generated output reports can be seen here:
- `micronaut-core.html <../_static/examples/micronaut-projects/micronaut-core/analyze_with_sbom/micronaut-core.html>`__
- `micronaut-core.json <../_static/examples/micronaut-projects/micronaut-core/analyze_with_sbom/micronaut-core.json>`__

.. _more-deps:

'''''''''''''''''''''''''''
Analyzing more dependencies
'''''''''''''''''''''''''''
Expand All @@ -203,30 +238,38 @@ This feature is enabled by default. To disable, or configure its behaviour in ot

See :ref:`dump-defaults <action_dump_defaults>`, the CLI command to dump the default configurations in ``defaults.ini``. After making changes, see :ref:`analyze <analyze-action-cli>` CLI command for the option to pass the modified ``defaults.ini`` file.

Within the configuration file under the ``repofinder.java`` header, five options exist: ``find_repos``, ``artifact_repositories``, ``repo_pom_paths``, ``find_parents``, ``artifact_ignore_list``. These options behave as follows:
Within the configuration file under the ``repofinder.java`` header, three options exist: ``artifact_repositories``, ``repo_pom_paths``, ``find_parents``. These options behave as follows:

- ``find_repos`` (Values: True or False) - Enables or disables the Repository Finding feature.
- ``artifact_repositories`` (Values: List of URLs) - Determines the remote artifact repositories to attempt to retrieve dependency information from.
- ``repo_pom_paths`` (Values: List of POM tags) - Determines where to search for repository information in the POM files. E.g. scm.url.
- ``find_parents`` (Values: True or False) - When enabled, the Repository Finding feature will also search for repository URLs in parents POM files of the current dependency.
- ``artifact_ignore_list`` (Values: List of GAs) - The Repository Finding feature will skip any artifact in this list. Format is "GroupId":"ArtifactId". E.g. org.apache.maven:maven

Under the related header ``repofinder``, two more options exist: ``find_repos``, and ``use_open_source_insights``:

- ``find_repos`` (Values: True or False) - Enables or disables the Repository Finding feature.
- ``use_open_source_insights`` (Values: True or False) - Enables or disables use of Google's Open Source Insights API.

.. note:: Finding repositories requires at least one remote call, adding some additional overhead to an analysis run.

.. note:: Google's Open Source Insights API is currently used to find repositories for: Python, Rust, .Net, NodeJS

An example configuration file for utilising this feature:

.. code-block:: ini
[repofinder.java]
[repofinder]
find_repos = True
use_open_source_insights = True
[repofinder.java]
artifact_repositories = https://repo.maven.apache.org/maven2
repo_pom_paths =
scm.url
scm.connection
scm.developerConnection
find_parents = True
artifact_ignore_list =
org.apache.maven:maven
-------------------------------------
Analyzing a locally cloned repository
Expand Down
13 changes: 13 additions & 0 deletions scripts/dev_scripts/integration_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ HOMEDIR=$2
RESOURCES=$WORKSPACE/src/macaron/resources
COMPARE_DEPS=$WORKSPACE/tests/dependency_analyzer/compare_dependencies.py
COMPARE_JSON_OUT=$WORKSPACE/tests/e2e/compare_e2e_result.py
TEST_REPO_FINDER=$WORKSPACE/tests/e2e/repo_finder/repo_finder.py
RUN_MACARON="python -m macaron -o $WORKSPACE/output"
RESULT_CODE=0

Expand Down Expand Up @@ -532,3 +533,15 @@ then
echo -e "Expected zero status code but got $RESULT_CODE."
exit 1
fi

# Testing the Repo Finder's remote calls.
# This requires the 'packageurl' Python module
echo -e "\n----------------------------------------------------------------------------------"
echo "Testing Repo Finder functionality."
echo -e "----------------------------------------------------------------------------------\n"
python $TEST_REPO_FINDER || log_fail
if [ $? -ne 0 ];
then
echo -e "Expect zero status code but got $?."
log_fail
fi
6 changes: 3 additions & 3 deletions src/macaron/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ def analyze_slsa_levels_single(analyzer_single_args: argparse.Namespace) -> None
# We don't mention --config-path as a possible option in this log message as it going to be move soon.
# See: https://github.com/oracle/macaron/issues/417
logger.error(
"Analysis target missing. Please provide a package url (PURL) and/or repo path. "
+ "Examples of a PURL can be seen at https://github.com/package-url/purl-spec: "
+ "pkg:github/micronaut-projects/micronaut-core."
"""Analysis target missing. Please provide a package url (PURL) and/or repo path.
Examples of a PURL can be seen at https://github.com/package-url/purl-spec:
pkg:github/micronaut-projects/micronaut-core."""
)
sys.exit(os.EX_USAGE)

Expand Down
8 changes: 4 additions & 4 deletions src/macaron/config/defaults.ini
Original file line number Diff line number Diff line change
Expand Up @@ -44,19 +44,19 @@ timeout = 2400
recursive = False

# This is the repo finder script.
[repofinder]
find_repos = True
use_open_source_insights = True

[repofinder.java]
# The list of maven-like repositories to attempt to retrieve artifact POMs from.
artifact_repositories = https://repo.maven.apache.org/maven2
find_repos = True
repo_pom_paths =
scm.url
scm.connection
scm.developerConnection
find_parents = True
parent_limit = 10
# Disables repo finding for specific artifacts based on their group and artifact IDs. Format: {groupId}:{artifactId}
# E.g. com.oracle.coherence.ce:coherence
artifact_ignore_list =

# Git services that Macaron has access to clone repositories.
# For security purposes, Macaron will only clone repositories from the hostnames specified.
Expand Down
1 change: 0 additions & 1 deletion src/macaron/config/global_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ class GlobalConfig:
gh_token: str = ""
debug_level: int = logging.DEBUG
resources_path: str = ""
find_repos: bool = True

def load(
self,
Expand Down
34 changes: 24 additions & 10 deletions src/macaron/dependency_analyzer/cyclonedx.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,14 @@
from collections.abc import Iterable
from pathlib import Path

from packageurl import PackageURL

from macaron.config.defaults import defaults
from macaron.config.global_config import global_config
from macaron.dependency_analyzer.dependency_resolver import DependencyAnalyzer, DependencyInfo
from macaron.errors import MacaronError
from macaron.output_reporter.scm import SCMStatus
from macaron.repo_finder.repo_validator import find_valid_repository_url

logger: logging.Logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -160,21 +163,32 @@ def convert_components_to_artifacts(
Returns
-------
dict
A dictionary where dependency artifacts are grouped based on "artifactId:groupId".
A dictionary where dependency artifacts are grouped based on "groupId:artifactId".
"""
all_versions: dict[str, list[DependencyInfo]] = {} # Stores all the versions of dependencies for debugging.
latest_deps: dict[str, DependencyInfo] = {} # Stores the latest version of dependencies.
url_to_artifact: dict[str, set] = {} # Used to detect artifacts that have similar repos.
for component in components:
try:
# TODO make this function language agnostic when CycloneDX SBOM processing also is.
# See https://github.com/oracle/macaron/issues/464
key = f"{component.get('group')}:{component.get('name')}"
if component.get("purl"):
purl = PackageURL.from_string(str(component.get("purl")))
else:
# TODO remove maven assumption when optional non-existence of the component's purl is handled
# See https://github.com/oracle/macaron/issues/464
purl = PackageURL(
type="maven",
namespace=component.get("group"),
name=component.get("name"),
version=component.get("version") or None,
)

# According to PEP-0589 all keys must be present in a TypedDict.
# See https://peps.python.org/pep-0589/#totality
item = DependencyInfo(
version=component.get("version") or "",
group=component.get("group") or "",
name=component.get("name") or "",
purl=component.get("purl") or "",
purl=purl,
url="",
note="",
available=SCMStatus.AVAILABLE,
Expand All @@ -187,10 +201,10 @@ def convert_components_to_artifacts(
# IN case of a build error, we use this as a heuristic to avoid analyzing
# submodules that produce development artifacts in the same repo.
if (
"snapshot"
in (item.get("version") or "").lower() # or "" is not necessary but mypy produces a FP otherwise.
"snapshot" in (purl.version or "").lower()
# or "" is not necessary but mypy produces a FP otherwise.
and root_component
and item.get("group") == root_component.get("group")
and purl.namespace == root_component.get("group")
):
continue
logger.debug(
Expand All @@ -199,7 +213,7 @@ def convert_components_to_artifacts(
)
else:
# Find a valid URL.
item["url"] = DependencyAnalyzer.find_valid_url(
item["url"] = find_valid_repository_url(
link.get("url") for link in component.get("externalReferences") # type: ignore
)

Expand Down Expand Up @@ -228,7 +242,7 @@ def get_deps_from_sbom(sbom_path: str | Path) -> dict[str, DependencyInfo]:
Returns
-------
A dictionary where dependency artifacts are grouped based on "artifactId:groupId".
A dictionary where dependency artifacts are grouped based on "groupId:artifactId".
"""
return convert_components_to_artifacts(
get_dep_components(
Expand Down
Loading

0 comments on commit bf118b3

Please sign in to comment.