Skip to content

Commit

Permalink
add splits gif
Browse files Browse the repository at this point in the history
  • Loading branch information
severo committed Feb 9, 2024
1 parent f703503 commit b699193
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docs/hub/datasets-data-files-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ There are no constraints on how to structure dataset repositories.
However, if you want the Dataset Viewer to show certain data files, or to separate your dataset in train/validation/test splits, you need to structure your dataset accordingly.
Often it is as simple as naming your data files according to their split names, e.g. `train.csv` and `test.csv`.

## What are splits and configurations?

Machine learning datasets typically have splits and may also have configurations. A _split_ is a subset of the dataset, like `train` and `test`, that are used during different stages of training and evaluating a model. A _configuration_ is a sub-dataset contained within a larger dataset. Configurations are especially common in multilingual speech datasets where there may be a different configuration for each language. If you're interested in learning more about splits and configurations, check out the [conceptual guide on "Splits and configurations"](https://huggingface.co/docs/datasets-server/configs_and_splits)!

![split-configs-server](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/split-configs-server.gif)

## File names and splits

To structure your dataset by naming your data files or directories according to their split names, see the [File names and splits](./datasets-file-names-and-splits) documentation.
Expand Down

0 comments on commit b699193

Please sign in to comment.