Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support Bulk Seed List Uploads #2319

Open
ikreymer opened this issue Jan 18, 2025 · 0 comments
Open

[Feature]: Support Bulk Seed List Uploads #2319

ikreymer opened this issue Jan 18, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@ikreymer
Copy link
Member

ikreymer commented Jan 18, 2025

What change would you like to see?

Users should be able to upload a bulk seed list as a text file, as an alternative to entering URLs in the list text box in the URL List option.

This text file can then be of any size / no limit to how many seeds can be specified (though additional crawl limits can apply).
The text file would be stored in the S3 bucket and mounted as a volume, and use the existing --seedList functionality in the crawler.

Some difference from the list text box:

  • Validation: Since we're bypassing the frontend here, there'd be no validation at crawl workflow creation time, however, invalid seeds should quickly appear in the error log once the crawl starts running. If failOnFailedSeed is set, then invalid seeds should also fail the whole crawl immediately.

Context

This issue supersedes #1107 and addresses #2312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Triage
Development

No branches or pull requests

1 participant