Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #728: Adding asterisk for metadata transfer #1117

Merged
merged 3 commits into from
Nov 13, 2023

Conversation

michaeldinzinger
Copy link
Contributor

Hello all

Goal:
When having a lot of metadata fields that all start with the same prefix, e.g. parse.[...], one can simply write

metadata.persist:
- parse.*

instead of listing them all.

Besides that, making mdToTransfer, mdToPersistOnly, trackPath and trackDepth protected instead of private. This makes it easier to create a Custom Metadata Transfer class.

@jnioche jnioche added this to the 2.11 milestone Nov 6, 2023
Copy link
Contributor

@jnioche jnioche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! this would be very helpful. See comments on possible improvements

@jnioche
Copy link
Contributor

jnioche commented Nov 12, 2023

That's fab, thanks @michaeldinzinger!

I think the change would affect both the metadata to transfer and the metadata to persist, in which case ideally we would like to have a test demonstrating the impact on the latter. We'd also need to make it explicit in the default config as well as the configs generated by the archetypes.

@michaeldinzinger
Copy link
Contributor Author

Thank you again for your quick reply.

I think the change would affect both the metadata to transfer and the metadata to persist, in which case ideally we would like to have a test demonstrating the impact on the latter.

How could such a demonstration test look like? For now, I extended the method testTransfer to cover the metadata.transfer scenario, whereas the method testFilter tests the metadata.persist scenario.

We'd also need to make it explicit in the default config as well as the configs generated by the archetypes.

I added a comment line in each of the crawler-default.yaml files.

Copy link
Contributor

@jnioche jnioche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Thanks a lot @michaeldinzinger, this is a great addition to StormCrawler

@jnioche jnioche merged commit 4d3340f into apache:master Nov 13, 2023
3 of 4 checks passed
@michaeldinzinger michaeldinzinger deleted the devAsteriskMetadataTransfer branch December 8, 2023 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants