Datasets
CV Korean Test 25.0 - Noise-Augmented (SCAI)
License: CC0-1.0
Locale: ko
Task: ASR
Format: MP3, JSONL
Size: 21.01 MB
IBT Torwali Literature Corpus
License: CC-BY-NC-4.0
Locale: trw
Task: NLP
Format: TXT
Size: 488.12 KB
Bulu_ALCAM-MultimodalDataset
License: NOODL-1.0
Locale: bum
Task: NLP
Format: MP3, TSV
Size: 31.28 MB
Hausa-TTS-Dataset
License: NOODL-1.0
Locale: hau
Task: TTS
Format: MP3, TSV
Size: 276.90 MB
Tamil Time Aligned Speech Dataset
License: CC-BY-NC-SA-4.0
Locale: tam
Task: ASR
Format: OGG, SRT
Size: 37.11 MB
ViQua² — Visual Question-answering about Quantities
License: CC-BY-SA-4.0
Locale: en-US
Task: CV
Format: JSON, JPEG
Size: 281.05 MB
Bamun-TTS-Dataset
License: NOODL-1.0
Locale: bax
Task: TTS
Format: MP3, TSV
Size: 219.97 MB
Territórios Digitais
License: CC-BY-4.0
Locale: pt, en
Task: N/A
Format: DOCX, PDF, XLSX
Size: 4.24 MB
Chuvash TTS
License: CC-BY-SA-4.0
Locale: cv
Task: TTS
Format: PARQUET
Size: 854.02 MB
RFE/RL Persian News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: fa
Task: NLP
Format: TXT
Size: 307.78 MB
Saraiki 10 Hours TTS Dataset
License: CC-BY-NC-SA-4.0
Locale: srk
Task: TTS
Format: WEBM, TSV
Size: 584.44 MB
Kannada Time Aligned Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: kan
Task: ASR
Format: OGG, SRT
Size: 355.77 MB
Sentence translation difficulty in Spanish - BOUQuET
License: CC-BY-SA-4.0
Locale: es
Task: MT
Format: TSV
Size: 81.48 KB
Yezoum_ALCAM-MultimodalDataset
License: NOODL-1.0
Locale: ewo
Task: NLP
Format: MP3, TSV
Size: 12.81 MB
Common Voice Spontaneous Speech 3.0 - Serian Bidayuh
License: CC0-1.0
Locale: sdo
Task: ASR
Format: MP3
Size: 201.26 MB
Common Voice Scripted Speech 25.0 - Pashto
License: CC0-1.0
Locale: ps
Task: ASR
Format: MP3
Size: 97.81 GB
Common Voice Scripted Speech 25.0 - English
License: CC0-1.0
Locale: en
Task: ASR
Format: MP3
Size: 87.84 GB
Common Voice Scripted Speech 25.0 - Catalan
License: CC0-1.0
Locale: ca
Task: ASR
Format: MP3
Size: 78.67 GB
Bamun-French Parallel Corpus 2.0
License: NOODL-1.0
Locale: bax
Task: MT
Format: TSV
Size: 184.29 KB
Common Voice Scripted Speech 25.0 - Kinyarwanda
License: CC0-1.0
Locale: rw
Task: ASR
Format: MP3
Size: 57.18 GB
Common Voice Scripted Speech 25.0 - French
License: CC0-1.0
Locale: fr
Task: ASR
Format: MP3
Size: 28.39 GB
Common Voice Scripted Speech 25.0 - Spanish
License: CC0-1.0
Locale: es
Task: ASR
Format: MP3
Size: 48.23 GB
Araina Text Corpus (Occitan Aranese)
License: CC0-1.0
Locale: oc
Task: LM
Format: txt
Size: 22.97 MB
Common Voice Scripted Speech 25.0 - Belarusian
License: CC0-1.0
Locale: be
Task: ASR
Format: MP3
Size: 36.21 GB