Datasets
IsiZulu Second Language Learner Speech Corpus
License: CC-BY-SA-4.0
Locale: zu
Task: CALL
Format: WAV, SQLITE
Size: 5.26 GB
Modern Greek Dictionary
License: CC BY-NC-ND 4.0
Locale: gr-GR
Task: NLP
Format: PARQUET
Size: 12.02 MB
ERT Press
License: CC BY-NC-ND 4.0
Locale: gr-GR
Task: NLP
Format: PARQUET
Size: 32.60 MB
Ladino-Spanish Lexical Resources
License: CC-BY-4.0
Locale: lad, spa
Task: MT
Format: TXT
Size: 39.92 KB
Yoruba-TTS-Dataset
License: NOODL-1.0
Locale: yor
Task: TTS
Format: MP3, TSV
Size: 319.05 MB
Şalom Ladino Corpus
License: CC-BY-4.0
Locale: lad
Task: LM
Format: TXT
Size: 403.16 KB
Ladino: Una Fraza al Diya
License: CC-BY-4.0
Locale: lad
Task: NLP
Format: OGG, JPEG, TSV
Size: 76.35 MB
Imágenes de Señalamientos en México
License: CC-BY-SA-4.0
Locale: es
Task: CV
Format: JPEG, JSON
Size: 2.23 GB
Kanuri Books Corpus
License: CC-BY-4.0
Locale: kr
Task: LM
Format: TXT
Size: 545.68 KB
LibriVox Italian TTS Female Voice
License: CC0-1.0
Locale: it
Task: TTS
Format: MP3, TSV
Size: 61.74 MB
LibriVox Czech TTS Female Voice
License: CC0-1.0
Locale: cs
Task: TTS
Format: MP3, TXT, TSV
Size: 178.58 MB
UK Sort Codes - ASR Evaluation
License: CC-BY-4.0
Locale: en-GB
Task: ASR
Format: WEBM, TSV
Size: 23.76 MB
otomí-hñähñu TTS Voz Masculina
License: CC-BY-SA-4.0
Locale: ote
Task: TTS
Format: MP3, TXT, TSV
Size: 119.54 MB
Yoruba-English Code-Switching (YECS) Corpus
License: NOODL-1.0
Locale: yo, en
Task: ASR
Format: WAV, CSV
Size: 9.71 GB
Awal Tamazight Dataset
License: CC-BY-4.0
Locale: zgh
Task: LM
Format: TSV, JSON, TXT
Size: 11.57 MB
RFE/RL Serbian, Bosnian, and Montenegrin (Balkan) News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: hbs
Task: NLP
Format: TXT
Size: 310.39 MB
RFE/RL Bulgarian News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: bg
Task: NLP
Format: TXT
Size: 49.82 MB
RFE/RL Azerbaijani News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: az,ru
Task: NLP
Format: TXT
Size: 211.65 MB
RFE/RL Belarusian News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: be
Task: NLP
Format: TXT
Size: 486.55 MB
RFE/RL Macedonian News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: mk
Task: NLP
Format: TXT
Size: 133.95 MB
LibriVox Croatian TTS Male Voice
License: CC0-1.0
Locale: hr
Task: TTS
Format: MP3, TXT, TSV
Size: 377.60 MB
RFE/RL Romanian (Moldova) News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: ro,ru,en
Task: NLP
Format: TXT
Size: 311.87 MB
RFE/RL Tajik News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: tg,ru
Task: NLP
Format: TXT
Size: 145.27 MB
Punjabi 10 Hours TTS
License: CC-BY-NC-SA-4.0
Locale: pnb
Task: TTS
Format: WEBM, TSV
Size: 481.96 MB