Skip to content

Releases: Esukhia/Corpora

Esukhia Corpora (2021)

26 Oct 03:46
74a7c4a
Compare
Choose a tag to compare

This is a database, current as of 2021, of Esukhia Tibetan-language corpora. It includes:

  • The Children's Story Speech Corpus
  • A collection of Frequency Lists (for use in Dakje, https://dakje.io/)
  • The Nanhai Corpus (Tibetan speech & text, ~1.2 million words)
  • A Parallel Corpus (of 84,000 English/Tibetan translations, see: http://84000.co)
  • A simplified-scheme, POS-tagged version of SOAS's Digital Communication corpus
  • Speech Tibetan transcripts pulled from a web-crawl.