Skip to content

Commit

Permalink
Make it clear that sorting in dataset viewerin only on first 5GB (#1577)
Browse files Browse the repository at this point in the history
  • Loading branch information
severo authored Jan 17, 2025
1 parent 71720d9 commit 9c3c831
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/hub/datasets-viewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ You can share a specific row by clicking on it, and then copying the URL in the

The Dataset Viewer supports large scale datasets, but depending on the data format it may only show the first 5GB of the dataset:

- For Parquet datasets: the Dataset Viewer shows the full dataset, but filtering and search are only enabled on the first 5GB.
- For datasets >5GB in other formats (e.g. [WebDataset](https://github.com/webdataset/webdataset) or JSON Lines): the Dataset Viewer only shows the first 5GB, and filtering and search are enabled on these first 5GB.
- For Parquet datasets: the Dataset Viewer shows the full dataset, but sorting, filtering and search are only enabled on the first 5GB.
- For datasets >5GB in other formats (e.g. [WebDataset](https://github.com/webdataset/webdataset) or JSON Lines): the Dataset Viewer only shows the first 5GB, and sorting, filtering and search are enabled on these first 5GB.

In this case, an informational message lets you know that the Viewer is partial. This should be a large enough sample to represent the full dataset accurately, let us know if you need a bigger sample.

Expand Down

0 comments on commit 9c3c831

Please sign in to comment.