Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing policy for legacy unreplicated data #8

Open
lliming opened this issue Sep 6, 2024 · 2 comments
Open

Indexing policy for legacy unreplicated data #8

lliming opened this issue Sep 6, 2024 · 2 comments
Assignees

Comments

@lliming
Copy link
Contributor

lliming commented Sep 6, 2024

LLNL's Solr index served as an index for several ESGF data nodes that didn't have their own index.
Consequently, LLNL's index contains metadata for datasets that have never been stored at LLNL, ANL, or ORNL.
Q: Should ANL's or ORNL's Phase I indices contain these entries?
Q: Should the Phase II consolidated index contain these entries?

@bstrdsmkr
Copy link

ORNL believes these datasets have been replicated, does anyone have evidence to the contrary? If so, we'd like to get them replicated. This means the likely answer to both questions should be yes, though that might be a policy question for @climate-dude ?

@sashakames
Copy link

I think there might be some misunderstanding. These are datasets published by NASA, NOAA, CCCma (canadians) DIAS (Japanese) sites, Taiwan, and several of the Korean and Chinese sites. Not all these datasets are replicated. What is crucial are the records that are hosted in the LLNL Solr. We don't need multiple copies of the records migrated to all the DOE site indexes. It would probably be easiest to just migrate those along with the LLNL records when migration time comes. I will produce a list of dataset and file counts for the data nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants