You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following the (Cache management)[https://huggingface.co/docs/datasets/en/cache] docu and previous behaviour from datasets version 2.18.0, one is able to change the cache directory. Previously, all downloaded/extracted/etc files were found in this folder. As i have recently update to the latest version this is not the case anymore. Downloaded files are stored in ~/.cache/huggingface/hub.
Providing the cache_dir argument in load_dataset the cache directory is created and there are some files but the bulk is still in ~/.cache/huggingface/hub.
I believe this could be solved by adding the cache_dir argument here
I would expect the bulk of files related to the dataset to be stored somewhere in ~/custom/cache/path/esc50, but it seems they are in ~/.cache/huggingface/hub/datasets--ashraq--esc50.
Hi ! Since datasets 3.x, the datasets specific files are in cache_dir= and the HF files are cached using huggingface_hub and you can set its cache directory using the HF_HOME environment variable.
They are independent, for example you can delete the Hub cache (containing downloaded files) but still reload your cached datasets from the datasets cache (containing prepared datasets in Arrow format)
Describe the bug
Following the (Cache management)[https://huggingface.co/docs/datasets/en/cache] docu and previous behaviour from datasets version 2.18.0, one is able to change the cache directory. Previously, all downloaded/extracted/etc files were found in this folder. As i have recently update to the latest version this is not the case anymore. Downloaded files are stored in
~/.cache/huggingface/hub
.Providing the
cache_dir
argument inload_dataset
the cache directory is created and there are some files but the bulk is still in~/.cache/huggingface/hub
.I believe this could be solved by adding the cache_dir argument here
Steps to reproduce the bug
For example using https://huggingface.co/datasets/ashraq/esc50:
Expected behavior
I would expect the bulk of files related to the dataset to be stored somewhere in
~/custom/cache/path/esc50
, but it seems they are in~/.cache/huggingface/hub/datasets--ashraq--esc50
.Environment info
datasets
version: 3.2.0huggingface_hub
version: 0.26.5fsspec
version: 2024.6.1The text was updated successfully, but these errors were encountered: