License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmpmpf0jw021bnu0743hu7763
Task: TTS
Release Date: 5/26/2026
Format: MP3, TSV
Size: 141.26 MB
Share
This dataset comprises 1,521 high-quality audio recordings of read speech produced by a single Duala speaker over several sessions. Duala (ISO 639-3: dua), also known as Douala, is a Bantu language of the Niger-Congo family spoken primarily in the Littoral Region of Cameroon, notably in the city of Douala and its surrounding areas. It is a low-resource language with limited existing digital speech resources, making this dataset a significant contribution to natural language processing efforts for the language. Audio files are provided in MP3 format (approx. 147 MB), totalling 4 hours, 30 minutes and 41.64 seconds of speech. The dataset includes 16 audio/sentence mapping files in TSV format, containing 1,521 aligned audio/sentence pairs in total. Transcriptions follow the General Alphabet of Cameroonian Languages, a standardised orthographic system based on the Latin alphabet augmented with phonetic characters and diacritical marks used to represent tonal and phonological features of Cameroonian languages. The recordings draw on narrative texts relating to colonial encounters and experiences. These narratives originally existed as oral and audio recordings and were subsequently transcribed. The read-speech recordings therefore reflect a rich oral tradition rendered in text, offering valuable prosodic and lexical diversity for training and evaluating TTS and ASR models. The dataset is intended for research and scientific use in speech technology for Duala.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
- For research and scientific use only - You agree that you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: determining the identity of the speakers in the dataset; attempt to clone the voice or train models that imitate the speakers in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the legal owner of the dataset.
Intended Use
The dataset is suitable for speech-related tasks, in particular Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) for Duala. The audio-text alignment in this dataset enables speech synthesis and speech recognition models to be trained or evaluated for the development of more inclusive and representative TTS and ASR tools for Duala, a low-resource African language with limited existing digital speech resources.
Duala (ISO 639-3: dua) is a Bantu language of the Niger-Congo family spoken primarily in the Littoral Region of Cameroon, with a significant speaker community in and around the city of Douala. It belongs to the A.20 subgroup of Bantu languages (Guthrie classification). Duala has a rich nominal and verbal morphology, including noun class agreement and tonal distinctions that are grammatically and lexically significant.
The recordings in this dataset were produced by a single speaker of Duala from the Littoral Region of Cameroon. No sub-variety distinctions are encoded in the dataset, though the speaker's variety is representative of the Duala regional norm as spoken in and around Douala.
The orthography used in the transcription of the audio recordings follows the General Alphabet of Cameroonian Languages (GACL), a standardised writing system developed under the auspices of the Cameroonian government and adopted for the codification of the national languages of Cameroon. The GACL is built on the Latin alphabet and augmented with phonetic characters drawn from the International Phonetic Alphabet (IPA), as well as diacritical marks used to represent the tonal and phonological properties of Cameroonian languages.
For Duala specifically, the alphabet includes characters such as ɓ (implosive bilabial), ɛ (open-mid front vowel), ɔ (open-mid back vowel), and ŋ (velar nasal), which represent sounds that are phonemically distinctive in the language but absent from the standard Latin alphabet. Tones — which are lexically and grammatically contrastive in Duala — are marked by diacritical accents placed over vowels: the acute accent (´) for high tone, the grave accent (`) for low tone, the circumflex (^) for falling tone, and the macron (¯) for mid tone. Combinations of these diacritics are used to represent contour tones. Vowel nasalisation and syllable-final nasals are also indicated where relevant. The result is a phonologically precise and consistent orthographic system that makes the transcriptions in this dataset linguistically faithful representations of the spoken Duala in the recordings.
This dataset was compiled as part of a Text-to-Speech data collection initiative for Duala. The textual source material consists of narratives about colonial encounters and experiences, which originally existed as oral and audio recordings and were later transcribed. The speaker read prepared passages drawn from these transcribed narratives, providing naturalistic and culturally grounded speech data. Recordings were made across multiple sessions and subsequently curated, deduplicated and aligned with their corresponding transcriptions.
The dataset consists of prompted read speech in Duala. The textual source material derives from transcribed oral narratives on colonial encounters and experiences, a historically and culturally significant genre in the Duala oral tradition. The recordings offer good prosodic and lexical diversity — including varied sentence lengths, clause structures, and narrative registers — making them suitable for TTS model training and ASR evaluation.
Total size of MP3 audio: approx. 147 MB Total size of TSV mapping files: approx. 257 KB
This dataset comprises audio clips and audio/text mapping files organised across 16 recording sessions. There are 1,521 audio clips in MP3 format, totalling 4 hours, 30 minutes and 41.64 seconds of speech. The dataset includes 16 audio/text mapping files (mapping.tsv), each containing aligned audio/sentence pairs for the corresponding session, with 1,521 aligned pairs in total. Each TSV file contains the following fields: audio_filename, key, sentence, attempts.
| Audio filename | Sentence |
|---|---|
| f909aa00d77ab80c4c13944135d02b2b.mp3 | Ndé Ɓakálá ɓá Jáman ɓá pɔínɔ̄ na ɓaɓɔ́ ɓá wáná mɔ́? |
| ca81bf48716b3812c4611ad5bd1cc2ef.mp3 | Hm. Níka o maɓolá eɓoló tɔ o sí maɓolá eɓoló o tá ndé ó Sáwá Tási ? |
| ee91b54b5325b54c205d97545b17c3b3.mp3 | Di tá dí ɓɛ́nɛ́ ótên esukúlu á gɔ́bina mɔ́mɛ́nɛ́. |
| 0c83dc83017caae2173ad24e79a604f5.mp3 | Ee, madɔ́kita má tâ madɔ́kita má tâ, madɔ́kita máadi pɛ́ má tâ ó ɓepólo nyay na nyay. |
| 022c4208667262a2b57956a149a17912.mp3 | Na wonja ndé mɔ́ pɛ́ ndé mí e elɔ́ŋgi á ekombo. Mɔ́ ndé mí e melodie má ekombo. |
| e1731df56954dffbdbb8d41642218348.mp3 | Ee, ɓá maɓolá ndé ekombo mɔní, ɓá maɓolá pɛ́ taɓako, níka mɔní mú sí tanɔ́ mú bíánɛ́ ɓwǎm ɓwam. |
| 8879a8e4eb8ecba5b6b00fc1b5348817.mp3 | O bí ná Ɓakálá ɓá tá ɓá ɗá súe jǐta. |
| c700a87e96cf2efecbf7004e9149ba65.mp3 | O sí mawunja pɛ́? O titíná pɛ́ ó wunja pɛ́? |
| 2cad390c37b051a8928106e3e498337a.mp3 | Mǐndó, mí sí ta. Ɓá tá ndé Ɓakálá. |
| c8cc0e9e28badf3018a72093f6ea963e.mp3 | Níka sɔ ndé á timbínɔ̄ sɔ dubisɛ mɔ́. |