Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTree roundtrip fails on None group lookup #9960

Open
5 tasks done
sjperkins opened this issue Jan 17, 2025 · 4 comments
Open
5 tasks done

DataTree roundtrip fails on None group lookup #9960

sjperkins opened this issue Jan 17, 2025 · 4 comments
Labels
bug topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library

Comments

@sjperkins
Copy link

What happened?

I'm attempting to use zarr 3.0.0 in conjunction with xarray 2025.1.1 in this PR

A roundtirpping test case started to fail

What did you expect to happen?

I would've expected the test case to succeed, but AFAICT on the open_datatree call to the local zarr store, one of the zarr groups resolves to None, resulting in the error below.

Minimal Complete Verifiable Example

import xarray
import xarray.testing as xt
import numpy as np

if __name__ == "__main__":
  ds = xarray.Dataset({
    "A": (("x", "y"), np.ones((128, 256))),
    "B": (("y", "x"), np.ones((256, 128))*2)
  })

  dt = xarray.DataTree.from_dict({
    "/root/a": ds,
    "/root/b": ds,
  })

  import shutil
  path = "/tmp/test_dt.zarr"
  shutil.rmtree(path, ignore_errors=True)
  dt.to_zarr(path, compute=True, mode="w")
  dt2 = xarray.open_datatree(path)
  xt.assert_identical(dt, dt2)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

$ python tests/test_zarr_roundtrip.py 
/home/simon/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/zarr/api/asynchronous.py:197: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(
Traceback (most recent call last):
  File "/home/simon/code/xarray-ms/tests/test_zarr_roundtrip.py", line 20, in <module>
    dt2 = xarray.open_datatree(path)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/backends/api.py", line 1113, in open_datatree
    backend_tree = backend.open_datatree(
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1614, in open_datatree
    groups_dict = self.open_groups_as_dict(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1665, in open_groups_as_dict
    stores = ZarrStore.open_store(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/backends/zarr.py", line 662, in open_store
    return {
           ^
  File "/home/simon/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/backends/zarr.py", line 663, in <dictcomp>
    group: cls(
           ^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/backends/zarr.py", line 744, in __init__
    self._read_only = self.zarr_group.read_only
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'read_only'

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.11 (main, Dec 4 2024, 08:55:08) [GCC 13.2.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-51-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_ZA.UTF-8
LOCALE: ('en_ZA', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2025.1.1
pandas: 2.2.3
numpy: 2.1.3
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.0.0
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.10.0
distributed: 2024.10.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 74.1.2
pip: 24.2
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.29.0
sphinx: 8.1.3

@sjperkins sjperkins added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 17, 2025
@TomNicholas TomNicholas added topic-zarr Related to zarr storage library topic-DataTree Related to the implementation of a DataTree class and removed needs triage Issue that has not been reviewed by xarray team member labels Jan 17, 2025
@aladinor
Copy link
Contributor

Hi @sjperkins and @TomNicholas

I tried to reproduce your example and indeed I found the same error. Zarr V3 may introduce this error.

Zarr V3 does not accept to have group paths containing "/" as prefixes. In your example, the groups are ['/', '/root', '/root/a', '/root/b'] which triggers the error here

https://github.com/pydata/xarray/blob/609412d8544217247ddf2f72f988da1b38ef01bc/xarray/backends/zarr.py#L662C4-L676C10

I am still working on it. However, I recommend installing zarr v2 if you need to temporarily solve this issue.

@sjperkins
Copy link
Author

Thanks for taking a look @aladinor. Interestingly enough the test case still fails if one simplifies the tree structure further

import xarray
import xarray.testing as xt
import numpy as np

if __name__ == "__main__":
  ds = xarray.Dataset({
    "A": (("x", "y"), np.ones((128, 256))),
    "B": (("y", "x"), np.ones((256, 128))*2)
  })

  dt = xarray.DataTree.from_dict({"a": ds,  "b": ds})

  import shutil
  path = "/tmp/test_dt.zarr"
  shutil.rmtree(path, ignore_errors=True)
  dt.to_zarr(path, compute=True, mode="w")
  dt2 = xarray.open_datatree(path)
  xt.assert_identical(dt, dt2)

@aladinor
Copy link
Contributor

aladinor commented Jan 22, 2025

Well, I just found that the new async implementation in Zarr V3 seems to not return the same object as in Zarr v2. Therefore, when requesting the group's dataset return None here

https://github.com/pydata/xarray/blob/609412d8544217247ddf2f72f988da1b38ef01bc/xarray/backends/zarr.py#L664C16-L664C39

Still looking for solutions.

@jhamman
Copy link
Member

jhamman commented Jan 25, 2025

I've opened #9984 as a central issue tracking zarr3 compatibility for datatree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests

4 participants