Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError raised when opening a 'plain' zarr with xarray+zarr 3 #9970

Open
5 tasks done
oliverwm1 opened this issue Jan 21, 2025 · 1 comment
Open
5 tasks done

ValueError raised when opening a 'plain' zarr with xarray+zarr 3 #9970

oliverwm1 opened this issue Jan 21, 2025 · 1 comment
Labels
bug topic-zarr Related to zarr storage library

Comments

@oliverwm1
Copy link

oliverwm1 commented Jan 21, 2025

What happened?

This report is specific to zarr 3. When using xarray to open a dataset that was written by zarr-python (i.e. one that is missing xarray's required dimension names metadata), zarr 3 raises a ValueError and gives a cryptic error message. In prior versions of zarr-python, a KeyError was raised and the error message was much more informative.

What did you expect to happen?

Raise a KeyError and give a more useful error message.

Minimal Complete Verifiable Example

import xarray
import zarr
import numpy

path = 'foo.zarr'

z = zarr.open_group(path)
arr = z.create_array('bar', shape=(3,5), dtype=numpy.float32)
arr[:] = 1.0

xarray.open_zarr(path)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):
  File "/Users/oliverwm/xarray-opening-zarr.py", line 11, in <module>
    xarray.open_zarr(path)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1491, in open_zarr
    ds = open_dataset(
         ^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/api.py", line 679, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1581, in open_dataset
    ds = store_entrypoint.open_dataset(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/store.py", line 44, in open_dataset
    vars, attrs = filename_or_obj.load()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/common.py", line 312, in load
    (_decode_variable_name(k), v) for k, v in self.get_variables().items()
                                              ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 858, in get_variables
    return FrozenDict((k, self.open_store_variable(k)) for k in self.array_keys())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/core/utils.py", line 415, in FrozenDict
    return Frozen(dict(*args, **kwargs))
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 858, in <genexpr>
    return FrozenDict((k, self.open_store_variable(k)) for k in self.array_keys())
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 824, in open_store_variable
    "preferred_chunks": dict(zip(dimensions, zarr_array.chunks, strict=True)),
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zip() argument 2 is longer than argument 1

Anything else we need to know?

If using zarr==2.18.4 a KeyError is raised and the error message is more useful:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 390, in _get_zarr_dims_and_attrs
    dimensions = zarr_obj.attrs[dimension_key]
                 ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/zarr/attrs.py", line 74, in __getitem__
    return self.asdict()[item]
           ~~~~~~~~~~~~~^^^^^^
KeyError: '_ARRAY_DIMENSIONS'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 404, in _get_zarr_dims_and_attrs
    os.path.basename(dim) for dim in zarray["_NCZARR_ARRAY"]["dimrefs"]
                                     ~~~~~~^^^^^^^^^^^^^^^^^
KeyError: '_NCZARR_ARRAY'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/oliverwm/xarray-opening-zarr.py", line 11, in <module>
    xarray.open_zarr(path)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1491, in open_zarr
    ds = open_dataset(
         ^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/api.py", line 679, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1581, in open_dataset
    ds = store_entrypoint.open_dataset(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/store.py", line 44, in open_dataset
    vars, attrs = filename_or_obj.load()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/common.py", line 312, in load
    (_decode_variable_name(k), v) for k, v in self.get_variables().items()
                                              ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 858, in get_variables
    return FrozenDict((k, self.open_store_variable(k)) for k in self.array_keys())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/core/utils.py", line 415, in FrozenDict
    return Frozen(dict(*args, **kwargs))
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 858, in <genexpr>
    return FrozenDict((k, self.open_store_variable(k)) for k in self.array_keys())
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 817, in open_store_variable
    dimensions, attributes = _get_zarr_dims_and_attrs(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test-xr-zr/lib/python3.11/site-packages/xarray/backends/zarr.py", line 407, in _get_zarr_dims_and_attrs
    raise KeyError(
KeyError: 'Zarr object is missing the attribute `_ARRAY_DIMENSIONS` and the NCZarr metadata, which are required for xarray to determine variable dimensions.'

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2025.1.1
pandas: 2.2.3
numpy: 2.2.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.0.1
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.1.0
pip: 24.2
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

@oliverwm1 oliverwm1 added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 21, 2025
@TomNicholas TomNicholas added topic-zarr Related to zarr storage library and removed needs triage Issue that has not been reviewed by xarray team member labels Jan 23, 2025
@jhamman
Copy link
Member

jhamman commented Jan 23, 2025

I looked into this today. This is where we extract the dimensions field from the v3 metadata:

dimensions = zarr_obj.metadata.dimension_names or ()

What we're missing is a check that the dimensions found are valid. Confirming that len(dimensions) == len(shape) seems like it could work but some testing would be required to make sure this is all that is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests

3 participants