Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester / Remove records by harvester UUID #8431

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fxprunayre
Copy link
Member

When harvester contains lot of records, remove records take a while or could even return heapspace errors.

Try to improve performances by using delete by query (instead of loop on each records) eg. 1500 records

  • Select > Delete all = 2min
  • Harvester > Remove records = 700ms

This will bypass events but maybe that is fine for harvested records?

Maybe there is better JPA alternative for this kind of query?

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

Funded by Ifremer

@fxprunayre fxprunayre added this to the 4.4.7 milestone Oct 14, 2024
@fxprunayre fxprunayre requested a review from josegar74 October 14, 2024 13:27
When harvester contains lot of records, remove records take a while or could even return heapspace errors.

Try to improve performances by using delete by query (instead of loop on each records)
eg. 1500 records
* Select > Delete all = 2min
* Harvester > Remove records = 700ms

This will bypass events but maybe that is fine for harvested records?

Maybe there is better JPA alternative for this kind of query?
@fxprunayre fxprunayre force-pushed the feature/447-harvester-removerecords branch from d151b55 to 2bad0f4 Compare October 14, 2024 13:48
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

Copy link
Member

@josegar74 josegar74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and it's much faster.

Some considerations:

  1. Metadata harvested with the GeoNetwork ‘protocol’ in MEF format, may store files in the data directory. In this case, the files should be deleted also.

To check if other harvesters support the MEF format.

  1. If the setting Allow editing on harvested records is enabled, the same problem will occur. Also, if the setting is enabled, it might make sense to use the original method to delete the metadata, which backs up the deleted metadata.


default void deleteAllByHarvesterUuid(String harvesterUuid) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for defining it as the default method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@fxprunayre
Copy link
Member Author

  1. Metadata harvested with the GeoNetwork ‘protocol’ in MEF format, may store files in the data directory. In this case, the files should be deleted also.

Indeed, also harvesting WMS most of the time produce thumbnails in the datadir.

@CLAassistant
Copy link

CLAassistant commented Dec 8, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants