add splits gif

huggingface · Feb 9, 2024 · b699193 · b699193
1 parent f703503
commit b699193
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/docs/hub/datasets-data-files-configuration.md b/docs/hub/datasets-data-files-configuration.md
@@ -5,6 +5,12 @@ There are no constraints on how to structure dataset repositories.
 However, if you want the Dataset Viewer to show certain data files, or to separate your dataset in train/validation/test splits, you need to structure your dataset accordingly.
 Often it is as simple as naming your data files according to their split names, e.g. `train.csv` and `test.csv`.
 
+## What are splits and configurations?
+
+Machine learning datasets typically have splits and may also have configurations. A _split_ is a subset of the dataset, like `train` and `test`, that are used during different stages of training and evaluating a model. A _configuration_ is a sub-dataset contained within a larger dataset. Configurations are especially common in multilingual speech datasets where there may be a different configuration for each language. If you're interested in learning more about splits and configurations, check out the [conceptual guide on "Splits and configurations"](https://huggingface.co/docs/datasets-server/configs_and_splits)!
+
+![split-configs-server](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/split-configs-server.gif)
+
 ## File names and splits
 
 To structure your dataset by naming your data files or directories according to their split names, see the [File names and splits](./datasets-file-names-and-splits) documentation.