FSD50K Speech Model Fine-tuning

MWE of fine-tuning a Transformer-based speech embedder (e.g. wav2vec 2.0) on a subset of FSD50K using pytorch_lightning and HuggingFace transformers.

Please refer to this executable Colab notebook importing the code from this repo as well as a 500-element subset of the original FSD50K dataset for a concrete train+test example.

Note: intended as an editable incentive for jumping into FSD50K and the Pytorch-Lightning+HuggingFace framework, and as a showcase for an end-of-studies project -- choices have been made and some logic has been altered to (greatly) reduce the size of the original code.

Attribution and licenses:

The FSD50K dataset is licensed under CC BY 4.0
The 500-element subset used here only includes CC0 1.0 audio samples

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
fsd50k_speech_model_finetuning		fsd50k_speech_model_finetuning
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSD50K Speech Model Fine-tuning

About

Releases

Packages

Languages

License

FlorentMeyer/fsd50k_speech_model_finetuning

Folders and files

Latest commit

History

Repository files navigation

FSD50K Speech Model Fine-tuning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages