You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
call to_zarr on a dataset with a string s3 url as the store argument, writing the dataset to the bucket
call to_zarr again on the same dataset with the same url as store argument => a FileExistsError is triggered
catch the error
delete the dataset just written from the s3 bucket, using s3fs for example (but the result is the same when using boto)
call to_zarr again on the same dataset with the same url as store argument => a FileExistsError is triggered again
What did you expect to happen?
As the dataset has been deleted, there should be no error and the dataset should be rewrittten to the s3 bucket.
Minimal Complete Verifiable Example
froms3fsimportS3FileSystemfromxarrayimportopen_datasetimportosfromdotenvimportload_dotenvfromtestcontainers.localstackimportLocalStackContainerimportpytestload_dotenv(".test.env")
BUCKET_NAME=os.environ["S3_BUCKET"]
@pytest.fixturedefs3_container():
port=8566container= (
LocalStackContainer().with_env("SERVICES", "s3").with_bind_ports(4566, port)
)
withcontainerascontainer:
region="us-east-1"client=container.get_client("s3", region_name=region)
client.create_bucket(Bucket=BUCKET_NAME)
yieldcontainer@pytest.mark.usefixtures("s3_container")deftest_rewrite_after_delete_after_error():
xarray_dataset=open_dataset("data.nc")
xarray_dataset.to_zarr(f"s3://{BUCKET_NAME}/data")
s3fs=S3FileSystem()
# bucket/data is *not* empty, data has been written successfullyasserts3fs.find(BUCKET_NAME) != []
asserts3fs.find("bucket/data") != []
try:
# Write the same data again, to trigger a FileExistsErrorxarray_dataset.to_zarr(f"s3://{BUCKET_NAME}/data")
exceptFileExistsError:
# we just silence the error and carry on.pass# Erase all the data from the bucket. The data should be writable again.forpathins3fs.ls(f"s3://{os.environ['S3_BUCKET']}", detail=False):
s3fs.rm(path, recursive=True)
# The bucket *is* empty. This can be checked with aws cli.asserts3fs.find(BUCKET_NAME) == []
# The data can be written to data2xarray_dataset.to_zarr(f"s3://{BUCKET_NAME}/data2")
# bucket/data2 is not empty, data has been written successfullyasserts3fs.find("bucket/data2") != []
# But a FileExistsError is triggered when writing to bucket/data again,# even though the directory is emptyxarray_dataset.to_zarr(f"s3://{BUCKET_NAME}/data")
MVCE confirmation
Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.
Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
=====================================================================================================FAILURES=====================================================================================================______________________________________________________________________________________test_rewrite_after_delete_after_error_______________________________________________________________________________________@pytest.mark.usefixtures("s3_container")deftest_rewrite_after_delete_after_error():
xarray_dataset=open_dataset("data.nc")
xarray_dataset.to_zarr(f"s3://{BUCKET_NAME}/data")
s3fs=S3FileSystem()
# bucket/data is *not* empty, data has been written successfullyasserts3fs.find(BUCKET_NAME) != []
asserts3fs.find(f"{BUCKET_NAME}/data") != []
try:
# Write the same data again, to trigger a FileExistsErrorxarray_dataset.to_zarr(f"s3://{BUCKET_NAME}/data")
exceptFileExistsError:
# we just silence the error and carry on.pass# Erase all the data from the bucket. The data should be writable again.forpathins3fs.ls(f"s3://{os.environ['S3_BUCKET']}", detail=False):
s3fs.rm(path, recursive=True)
# The bucket *is* empty. This can be checked with aws cli.asserts3fs.find(BUCKET_NAME) == []
# The data can be written to data2xarray_dataset.to_zarr(f"s3://{BUCKET_NAME}/data2")
# bucket/data2 is not empty, data has been written successfullyasserts3fs.find(f"{BUCKET_NAME}/data2") != []
# But a FileExistsError is triggered when writing to bucket/data again,# even though the directory is empty>xarray_dataset.to_zarr(f"s3://{BUCKET_NAME}/data")
test_xarray.py:82:
__ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.12/site-packages/xarray/core/dataset.py:2622: into_zarrreturnto_zarr( # type: ignore[call-overload,misc]
.venv/lib/python3.12/site-packages/xarray/backends/api.py:2194: into_zarrzstore=backends.ZarrStore.open_group(
.venv/lib/python3.12/site-packages/xarray/backends/zarr.py:703: inopen_group
) =_get_open_params(
.venv/lib/python3.12/site-packages/xarray/backends/zarr.py:1792: in_get_open_paramszarr_group=zarr.open_group(store, **open_kwargs)
.venv/lib/python3.12/site-packages/zarr/_compat.py:43: ininner_freturnf(*args, **kwargs)
.venv/lib/python3.12/site-packages/zarr/api/synchronous.py:524: inopen_groupsync(
.venv/lib/python3.12/site-packages/zarr/core/sync.py:142: insyncraisereturn_result
.venv/lib/python3.12/site-packages/zarr/core/sync.py:98: in_runnerreturnawaitcoro
.venv/lib/python3.12/site-packages/zarr/api/asynchronous.py:800: inopen_groupstore_path=awaitmake_store_path(store, mode=mode, storage_options=storage_options, path=path)
.venv/lib/python3.12/site-packages/zarr/storage/_common.py:318: inmake_store_pathresult=awaitStorePath.open(store, path=path_normalized, mode=mode)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls=<class'zarr.storage._common.StorePath'>, store=<FsspecStore(S3FileSystem, bucket/data)>, path='', mode='w-'
@classmethodasyncdefopen(
cls, store: Store, path: str, mode: AccessModeLiteral|None=None
) ->StorePath:
""" Open StorePath based on the provided mode. * If the mode is 'w-' and the StorePath contains keys, raise a FileExistsError. * If the mode is 'w', delete all keys nested within the StorePath * If the mode is 'a', 'r', or 'r+', do nothing Parameters ---------- mode : AccessModeLiteral The mode to use when initializing the store path. Raises ------ FileExistsError If the mode is 'w-' and the store path already exists. """awaitstore._ensure_open()
self=cls(store, path)
# fastpath if mode is NoneifmodeisNone:
returnselfifstore.read_onlyandmode!="r":
raiseValueError(f"Store is read-only but mode is '{mode}'")
matchmode:
case"w-":
ifnotawaitself.is_empty():
msg= (
f"{self} is not empty, but `mode` is set to 'w-'.""Either remove the existing objects in storage,""or set `mode` to a value that handles pre-existing objects""in storage, like `a` or `w`."
)
>raiseFileExistsError(msg)
EFileExistsError: <FsspecStore(S3FileSystem, bucket/data)>isnotempty, but`mode`issetto'w-'.Eitherremovetheexistingobjectsinstorage,orset`mode`toavaluethathandlespre-existingobjectsinstorage, like`a`or`w`.
.venv/lib/python3.12/site-packages/zarr/storage/_common.py:91: FileExistsError----------------------------------------------------------------------------------------------Capturedstderrsetup-----------------------------------------------------------------------------------------------Pullingimagelocalstack/localstack:2.0.1Containerstarted: e9d5537e47e1Waitingforcontainer<Container: e9d5537e47e1>withimagelocalstack/localstack:2.0.1tobeready ...
Anything else we need to know?
I put together a full MRE project, which can be found here. It can be run by
cloning the repo
installing the dependencies (I personnally used uv for that, but the pyproject.toml file should be standard enough)
creating an aws profile for the testcontainers s3 instance, e.g. :
Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!
nicos68
changed the title
Apparent cache poisoning when a FileExistsError is triggered
Apparent cache refresh bug when a FileExistsError is triggered
Jan 24, 2025
What happened?
to_zarr
on a dataset with a string s3 url as the store argument, writing the dataset to the bucketto_zarr
again on the same dataset with the same url as store argument => aFileExistsError
is triggeredto_zarr
again on the same dataset with the same url as store argument => aFileExistsError
is triggered againWhat did you expect to happen?
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
I put together a full MRE project, which can be found here. It can be run by
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.7 (main, Oct 16 2024, 04:37:19) [Clang 18.1.8 ]
python-bits: 64
OS: Linux
OS-release: 6.12.9-1-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.4-development
xarray: 2025.1.1
pandas: 2.2.3
numpy: 2.1.3
scipy: 1.15.1
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 3.0.1
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2025.1.0
distributed: 2025.1.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: 0.8.2
fsspec: 2024.12.0
cupy: None
pint: None
sparse: None
flox: 0.9.15
numpy_groupies: 0.11.2
setuptools: None
pip: None
conda: None
pytest: 8.3.4
mypy: None
IPython: None
sphinx: None
The text was updated successfully, but these errors were encountered: