Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kerchunk indices embedded in STAC items #33

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

jsignell
Copy link
Member

Closes: #32

This PR supports the following:

import pystac
import xarray as xr

path = "https://raw.githubusercontent.com/stac-utils/xpystac/kerchunk/tests/data/data-cube-kerchunk-item-collection.json"
item_collection = pystac.ItemCollection.from_file(path)

ds = xr.open_dataset(item_collection)
ds

@jsignell
Copy link
Member Author

@TomAugspurger I switched to just copying the function that you have in xstac over into this library. I wasn't sure how else to handle supporting python 3.9. Do you have thoughts on the roles of these two libraries? It kind of makes sense to me to separate the writing of STAC metadata from the reading of it, and it definitely makes sense that the tool for reading would want to support older version of python, but I don't really have a sense of where shared utils should live or if they should even exist.

@TomAugspurger
Copy link

TomAugspurger commented Oct 30, 2023

Do you have thoughts on the roles of these two libraries? It kind of makes sense to me to separate the writing of STAC metadata from the reading of it

I'm not sure. I do agree that keeping reading and writing separate might make sense. The naming is kinda unfortunate though :)

I can update the function to not use the match statement. I was just having fun, and kind of forgot that we'd actually need this at runtime to decode from STAC -> Kerchunk.

Speaking of which, do you have thoughts on whether it makes sense to just embed the regular Kerchunk indices at kerchunk:indices vs. splitting them up and putting them next to the variables in the datacube extension? My thoughts were

  1. There might be some duplicate info that can be deduplicated (I'm not sure if this is true. I thought that cube:... had a spot for attrs but I apparently made that up)
  2. We (the PC team and anyone else using pgstac) want to avoid JSON strings inside the database as much as possible, so that we can take advantage of pgstac's dehydration. I think in Kerchunk all of the .zattrs and .zarray stuff are strings, and so we'll need some client-side code that converts the objects to strings. I figured if you need some client-side code anyway, a bit more to search for kerchunk stuff throughout the Item wouldn't be too much worse.

But I've been second guessing that decision. Would be curious to hear your thoughts.

@jsignell
Copy link
Member Author

I do agree that keeping reading and writing separate might make sense. The naming is kinda unfortunate though :)

Totally agree. I kind of get around it by thinking wellll no one really needs to care about the name of the library. It's the name of the entrypoint that matters. Which probably gives us flexibility for just renaming this library if we ever want to.

I can update the function to not use the match statement. I was just having fun, and kind of forgot that we'd actually need this at runtime to decode from STAC -> Kerchunk.

Ah yeah I see your comment about that now. It is elegant.

do you have thoughts on whether it makes sense to just embed the regular Kerchunk indices at kerchunk:indices vs. splitting them up and putting them next to the variables in the datacube extension?

I don't have thoughts at this time, but if you aren't already, it might be helpful to think about how these items would look as geoparquet. As the STAC metadata expands it seems like it'll only get more compelling to represent catalogs or at least item collections as geoparquet rather than geojson.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Kerchunk indices embedded in STAC items
2 participants