Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data products duplicated across en-registry and other node registries #351

Open
jordanpadams opened this issue Nov 16, 2024 · 10 comments
Open
Assignees
Labels
B15.1 bug Something isn't working s.high High severity sprint-backlog

Comments

@jordanpadams
Copy link
Member

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

https://pds.mcp.nasa.gov/api/search/1/products/urn:nasa:pds:xas_synthesized_glasses::1.0/members?fields=lid,ops:Label_File_Info.ops:file_ref

returns duplicate collections between EN and GEO

🕵️ Expected behavior

I expected on the GEO collections to be returned

📜 To Reproduce

GET https://pds.mcp.nasa.gov/api/search/1/products/urn:nasa:pds:xas_synthesized_glasses::1.0/members?fields=lid,ops:Label_File_Info.ops:file_ref

🖥 Environment Info

No response

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

It looks like when context products were loaded into the registry, lots of other data that should not have been loaded, were loaded.

We may want to completely wipe out the EN Registry, and reload what we want.

🎉 Integration & Test

No response

@alexdunnjpl
Copy link
Contributor

@jordanpadams if nuking the en registry is an acceptable solution, it's gonna be much faster than iterating through and cross-checking against GEO. But either is viable.

@jordanpadams
Copy link
Member Author

@alexdunnjpl I think that is probably the best solution. We just need to scrub the wiki page for loading the data to make sure we specifically document what a full reload looks like.

https://wiki.jpl.nasa.gov/display/PDSEN/v2.+Next-Gen+OpenSearch+Registry

@alexdunnjpl
Copy link
Contributor

@jordanpadams @tloubrieu-jpl should the -dd index be emptied as well?

Do we want to preserve the existing mappings, or is it wise to nuke and have those regenerated from scratch as well?

@alexdunnjpl
Copy link
Contributor

@sjoshi-jpl could I trouble you to take this over once these questions are answered? I don't have permission to delete/recreate the en-registry* indices, which will be by far the fastest way of resolving this.

@jordanpadams
Copy link
Member Author

@alexdunnjpl @sjoshi-jpl this is going to require some coordination in order to make sure once we recreate the registry, we immediately reload all the necessary data.

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Nov 26, 2024

We decided to delete all the EN indices, re-create them with registry-mgr, and re-populate them by harvesting the following directories:
--> delete en-registry, en-registry-dd, en-registry-refs
--> registry-mgr to create the indices again
--> load the data:

/data/pds4/misc
/data/pds4/documents
data/pds4/context-pds4/agency
/data/pds4/context-pds4/node
/home/pds4/edwg/data/pds4/context-pds4/facility
/home/pds4/edwg/data/pds4/context-pds4/instrument
/home/pds4/edwg/data/pds4/context-pds4/instrument_host
/home/pds4/edwg/data/pds4/context-pds4/investigation
/home/pds4/edwg/data/pds4/context-pds4/service
/home/pds4/edwg/data/pds4/context-pds4/target
/home/pds4/edwg/data/pds4/context-pds4/telescope

@alexdunnjpl
Copy link
Contributor

Removing my assignment as no action required

@alexdunnjpl alexdunnjpl removed their assignment Nov 26, 2024
@jordanpadams jordanpadams transferred this issue from NASA-PDS/registry-api Dec 3, 2024
@alexdunnjpl
Copy link
Contributor

Awaiting return of @tloubrieu-jpl @sjoshi-jpl, at which point this will be top priority per @jordanpadams

@tloubrieu-jpl
Copy link
Member

We need to have a role which allows to write in the discipline node's index and read in all the other node's indexes.

@tloubrieu-jpl
Copy link
Member

blocked by #350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B15.1 bug Something isn't working s.high High severity sprint-backlog
Projects
Status: ToDo
Status: ToDo
Development

No branches or pull requests

4 participants