From 9c3c8310ffd87e707a596d2343de593e5a9bf3d8 Mon Sep 17 00:00:00 2001 From: Sylvain Lesage Date: Fri, 17 Jan 2025 11:05:21 +0100 Subject: [PATCH] Make it clear that sorting in dataset viewerin only on first 5GB (#1577) --- docs/hub/datasets-viewer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/hub/datasets-viewer.md b/docs/hub/datasets-viewer.md index 27f426b11..28f8715e1 100644 --- a/docs/hub/datasets-viewer.md +++ b/docs/hub/datasets-viewer.md @@ -33,8 +33,8 @@ You can share a specific row by clicking on it, and then copying the URL in the The Dataset Viewer supports large scale datasets, but depending on the data format it may only show the first 5GB of the dataset: -- For Parquet datasets: the Dataset Viewer shows the full dataset, but filtering and search are only enabled on the first 5GB. -- For datasets >5GB in other formats (e.g. [WebDataset](https://github.com/webdataset/webdataset) or JSON Lines): the Dataset Viewer only shows the first 5GB, and filtering and search are enabled on these first 5GB. +- For Parquet datasets: the Dataset Viewer shows the full dataset, but sorting, filtering and search are only enabled on the first 5GB. +- For datasets >5GB in other formats (e.g. [WebDataset](https://github.com/webdataset/webdataset) or JSON Lines): the Dataset Viewer only shows the first 5GB, and sorting, filtering and search are enabled on these first 5GB. In this case, an informational message lets you know that the Viewer is partial. This should be a large enough sample to represent the full dataset accurately, let us know if you need a bigger sample.