Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeGate doesn't distinguish Python built-in modules and external packages #518

Open
danbarr opened this issue Jan 8, 2025 · 6 comments
Assignees
Labels

Comments

@danbarr
Copy link
Collaborator

danbarr commented Jan 8, 2025

Describe the issue

CodeGate isn't aware of the built-in Python modules, and may treat imports of these as references to external packages.

The specific case I've encountered is hashlib - at one time it was an external package and so exists in PyPI and thus also in our data set, but the external package was archived and moved to built-in ages ago. When CodeGate encounters import hashlib in code, it finds the archived package in the vector DB, and reports it as archived/deprecated.

Insight report - https://www.insight.stacklok.com/report/pypi/hashlib
PyPI entry - https://pypi.org/project/hashlib/20081119/

CodeGate behavior:
Image

Steps to Reproduce

Reference the app.py file from the codegate-demonstration repo using Copilot or Continue chat.

Operating System

MacOS (Arm)

IDE and Version

VS Code 1.96.2

Extension and Version

Any

Provider

GitHub Copilot

Model

Any

Logs

2025-01-08T21:19:27.008Z [debug    ] Found matching packages in sqlite-vec database matched_packages=['hashlib (crates)', 'hashlib (pypi)', 'invokehttp (pypi)'] module=codegate pathname=/app/src/codegate/pipeline/codegate_context_retriever/codegate.py
2025-01-08T21:19:27.008Z [debug    ] Final context message          context_message=Context: hashlib is a Rust package available on Crates ecosystem.  However, this package is found to be archived and no longer maintained. For additional information refer to https://www.insight.stacklok.com/report/crates/hashlib - Package offers this functionality: Provide various hash algorithms under a same abstraction layer.
hashlib is a Python package available on PyPI ecosystem.  However, this package is found to be deprecated and no longer recommended for use. For additional information refer to https://www.insight.stacklok.com/report/pypi/hashlib - Package offers this functionality: Secure hash and message digest algorithm library

Additional Context

No response

@danbarr
Copy link
Collaborator Author

danbarr commented Jan 8, 2025

There's a potential secondary issue here too, where CodeGate is reporting this as both a Crates and PyPI package even though this is a Python file, shall I open a separate issue for this?

Image

@lukehinds
Copy link
Contributor

@ptelang is this covered by #475 ?

@lukehinds
Copy link
Contributor

@ptelang retest

@ptelang
Copy link
Contributor

ptelang commented Jan 13, 2025

There's a potential secondary issue here too, where CodeGate is reporting this as both a Crates and PyPI package even though this is a Python file, shall I open a separate issue for this?

This issue is fixed in the latest version by this PR.

@ptelang
Copy link
Contributor

ptelang commented Jan 13, 2025

Currently, Codegate cannot identify libraries like hashlib which were external earlier but are now built into python.

We can address this issue when the projects functionality is implemented. Codegate can then read the dependency files (e.g. requirements.tx, pyproject.toml, etc.) to detect cases like hashlib and prevent the false positive.

@lukehinds
Copy link
Contributor

@ptelang this is fixed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants