Releases: Esukhia/Corpora
Releases · Esukhia/Corpora
Esukhia Corpora (2021)
This is a database, current as of 2021, of Esukhia Tibetan-language corpora. It includes:
- The Children's Story Speech Corpus
- A collection of Frequency Lists (for use in Dakje, https://dakje.io/)
- The Nanhai Corpus (Tibetan speech & text, ~1.2 million words)
- A Parallel Corpus (of 84,000 English/Tibetan translations, see: http://84000.co)
- A simplified-scheme, POS-tagged version of SOAS's Digital Communication corpus
- Speech Tibetan transcripts pulled from a web-crawl.