Skip to content

Commit

Permalink
Rename previous canonical datasets (#1243)
Browse files Browse the repository at this point in the history
* glue -> nyu-mll/glue

* rename squad
  • Loading branch information
severo authored Mar 14, 2024
1 parent 1669464 commit e18bc19
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/hub/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ This is equivalent to `huggingface_hub.get_collection()`.

List collections from the Hub, based on some criteria. The supported parameters are:
- `owner` (string): filter collections created by a specific user or organization.
- `item` (string): filter collections containing a specific item. Value must be the item_type and item_id concatenated. Example: `"models/teknium/OpenHermes-2.5-Mistral-7B"`, `"datasets/squad"` or `"papers/2311.12983"`.
- `item` (string): filter collections containing a specific item. Value must be the item_type and item_id concatenated. Example: `"models/teknium/OpenHermes-2.5-Mistral-7B"`, `"datasets/rajpurkar/squad"` or `"papers/2311.12983"`.
- `sort` (string): sort the returned collections. Supported values are `"lastModified"`, `"trending"` (default) and `"upvotes"`.
- `limit` (int): maximum number (100) of collections per page.
- `q` (string): filter based on substrings for titles & descriptions.
Expand Down
2 changes: 1 addition & 1 deletion docs/hub/datasets-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Datasets on the Hub

The Hugging Face Hub hosts a [large number of community-curated datasets](https://huggingface.co/datasets) for a diverse range of tasks such as translation, automatic speech recognition, and image classification. Alongside the information contained in the [dataset card](./datasets-cards), many datasets, such as [GLUE](https://huggingface.co/datasets/glue), include a [Dataset Viewer](./datasets-viewer) to showcase the data.
The Hugging Face Hub hosts a [large number of community-curated datasets](https://huggingface.co/datasets) for a diverse range of tasks such as translation, automatic speech recognition, and image classification. Alongside the information contained in the [dataset card](./datasets-cards), many datasets, such as [GLUE](https://huggingface.co/datasets/nyu-mll/glue), include a [Dataset Viewer](./datasets-viewer) to showcase the data.

Each dataset is a [Git repository](./repositories) that contains the data required to generate splits for training, evaluation, and testing. For information on how a dataset repository is structured, refer to the [Data files Configuration page](./datasets-data-files-configuration). Following the supported repo structure will ensure that the dataset page on the Hub will have a Viewer.

Expand Down
6 changes: 3 additions & 3 deletions docs/hub/datasets-viewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ You can search for a word in the dataset by typing it in the search bar at the t

## Share a specific row

You can share a specific row by clicking on it, and then copying the URL in the address bar of your browser. For example https://huggingface.co/datasets/glue/viewer/mrpc/test?p=2&row=241 will open the dataset viewer on the MRPC dataset, on the test split, and on the 241st row.
You can share a specific row by clicking on it, and then copying the URL in the address bar of your browser. For example https://huggingface.co/datasets/nyu-mll/glue/viewer/mrpc/test?p=2&row=241 will open the dataset viewer on the MRPC dataset, on the test split, and on the 241st row.

## Large scale datasets

Expand All @@ -35,7 +35,7 @@ In this case, an informational message lets you know that the Viewer is partial.

## Access the parquet files

To power the dataset viewer, the first 5GB of every dataset are auto-converted to the Parquet format (unless it was already a Parquet dataset). In the dataset viewer (for example, see [`datasets/glue`](https://huggingface.co/datasets/glue)), you can click on [_"Auto-converted to Parquet"_](https://huggingface.co/datasets/glue/tree/refs%2Fconvert%2Fparquet/cola) to access the Parquet files. Please, refer to the [Datasets Server docs](/docs/datasets-server/parquet_process) to learn how to query the dataset parquet files with libraries such as Polars, Pandas or DuckDB.
To power the dataset viewer, the first 5GB of every dataset are auto-converted to the Parquet format (unless it was already a Parquet dataset). In the dataset viewer (for example, see [GLUE](https://huggingface.co/datasets/nyu-mll/glue)), you can click on [_"Auto-converted to Parquet"_](https://huggingface.co/datasets/nyu-mll/glue/tree/refs%2Fconvert%2Fparquet/cola) to access the Parquet files. Please, refer to the [Datasets Server docs](/docs/datasets-server/parquet_process) to learn how to query the dataset parquet files with libraries such as Polars, Pandas or DuckDB.

<Tip>

Expand All @@ -54,7 +54,7 @@ When you create a new dataset, the [`parquet-converter` bot](https://huggingface

### Programmatic access

You can also access the list of Parquet files programmatically using the [Hub API](./api#get-apidatasetsrepoidparquet); for example, endpoint [`https://huggingface.co/api/datasets/glue/parquet`](https://huggingface.co/api/datasets/glue/parquet) lists the parquet files of the glue dataset.
You can also access the list of Parquet files programmatically using the [Hub API](./api#get-apidatasetsrepoidparquet); for example, endpoint [`https://huggingface.co/api/datasets/nyu-mll/glue/parquet`](https://huggingface.co/api/datasets/nyu-mll/glue/parquet) lists the parquet files of the `nyu-mll/glue` dataset.

## Dataset preview

Expand Down

0 comments on commit e18bc19

Please sign in to comment.