Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow browsing and discovery of knowlege base in the search page #1073

Merged
merged 18 commits into from
Jan 21, 2025

Conversation

sabaimran
Copy link
Member

Currently, it's rather opaque and difficult to substantially browse through the uploaded knowledge base. Effectively, you can only do this through the small file modal in the settings page.

Update to include all indexed files in the search page for viewing & deletion. Function to delete all files is still in the settings page.

Add a migration that associates file objects with entrys using a foreign key. Add a migration command that deletes dangling fileobjects.

- One current issue in the Khoj application is that managing the files being referenced as the user's knowledge base is slightly opaque and difficult to access
- Add a migration for associating the fileobjects directly with the Entry objects, making it easier to get data via foreign key
- Add the new page that shows all indexed files in the search view, also allowing you to upload new docs directly from that page
- Support new APIs for getting / deleting files
- Remove knowledge page from the sidebar
- Improve speed and rendering of the documents in the search page
@sabaimran sabaimran requested a review from debanjum January 11, 2025 05:11
@sabaimran sabaimran changed the title Allow browsing and discover of knowlege base in the search page Allow browsing and discovery of knowlege base in the search page Jan 13, 2025
Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments but otherwise the changes look good!

src/khoj/routers/api_content.py Outdated Show resolved Hide resolved
src/interface/web/app/search/page.tsx Outdated Show resolved Hide resolved
src/interface/web/app/search/page.tsx Show resolved Hide resolved
src/interface/web/app/settings/page.tsx Outdated Show resolved Hide resolved
entries_to_update = []
for entry in entries:
try:
file_object = file_objects_map.get((entry.user_id, entry.file_path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also delete orphaned file objects as part of this migration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to say no. It's better to avoid complex data migrations taking place alongside database migrations to reduce any risk of long jobs. I'm on the fence even about running migrate_entry_objects here, except that self-hosted users would get left out of the new setup if it didn't run automatically, which could break server-side expectations.

The consequence of not deleting orphaned file objects automatically is that self-hosted users might see those dangling files in their /search page. I think that's okay in tradeoff, to avoid auto-running large data deletion operations live.

Here are some alternatives:

  1. We could add the management command instructions for a Docker-friendly environment in the release notes.
  2. We could re-use the older migrations stuff we'd built in cli.py when everything was based off of the config.yml, alternatively, but it'll have to be upgraded to align with how things work now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue the discussion after merge. We can add/modify as needed before release.

@sabaimran sabaimran merged commit e0dcd11 into master Jan 21, 2025
10 checks passed
@sabaimran sabaimran deleted the features/add-a-knowledge-base-page branch January 21, 2025 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants