Duala-TTS-Dataset

Description

This dataset comprises 1,521 high-quality audio recordings of read speech produced by a single Duala speaker over several sessions. Duala (ISO 639-3: dua), also known as Douala, is a Bantu language of the Niger-Congo family spoken primarily in the Littoral Region of Cameroon, notably in the city of Douala and its surrounding areas. It is a low-resource language with limited existing digital speech resources, making this dataset a significant contribution to natural language processing efforts for the language. Audio files are provided in MP3 format (approx. 147 MB), totalling 4 hours, 30 minutes and 41.64 seconds of speech. The dataset includes 16 audio/sentence mapping files in TSV format, containing 1,521 aligned audio/sentence pairs in total. Transcriptions follow the General Alphabet of Cameroonian Languages, a standardised orthographic system based on the Latin alphabet augmented with phonetic characters and diacritical marks used to represent tonal and phonological features of Cameroonian languages. The recordings draw on narrative texts relating to colonial encounters and experiences. These narratives originally existed as oral and audio recordings and were subsequently transcribed. The read-speech recordings therefore reflect a rich oral tradition rendered in text, offering valuable prosodic and lexical diversity for training and evaluating TTS and ASR models. The dataset is intended for research and scientific use in speech technology for Duala.

Language

Duala (ISO 639-3: dua) is a Bantu language of the Niger-Congo family spoken primarily in the Littoral Region of Cameroon, with a significant speaker community in and around the city of Douala. It belongs to the A.20 subgroup of Bantu languages (Guthrie classification). Duala has a rich nominal and verbal morphology, including noun class agreement and tonal distinctions that are grammatically and lexically significant.

Variants

The recordings in this dataset were produced by a single speaker of Duala from the Littoral Region of Cameroon. No sub-variety distinctions are encoded in the dataset, though the speaker's variety is representative of the Duala regional norm as spoken in and around Douala.

Alphabet

The orthography used in the transcription of the audio recordings follows the General Alphabet of Cameroonian Languages (GACL), a standardised writing system developed under the auspices of the Cameroonian government and adopted for the codification of the national languages of Cameroon. The GACL is built on the Latin alphabet and augmented with phonetic characters drawn from the International Phonetic Alphabet (IPA), as well as diacritical marks used to represent the tonal and phonological properties of Cameroonian languages.

For Duala specifically, the alphabet includes characters such as ɓ (implosive bilabial), ɛ (open-mid front vowel), ɔ (open-mid back vowel), and ŋ (velar nasal), which represent sounds that are phonemically distinctive in the language but absent from the standard Latin alphabet. Tones — which are lexically and grammatically contrastive in Duala — are marked by diacritical accents placed over vowels: the acute accent (´) for high tone, the grave accent (`) for low tone, the circumflex (^) for falling tone, and the macron (¯) for mid tone. Combinations of these diacritics are used to represent contour tones. Vowel nasalisation and syllable-final nasals are also indicated where relevant. The result is a phonologically precise and consistent orthographic system that makes the transcriptions in this dataset linguistically faithful representations of the spoken Duala in the recordings.

Source

This dataset was compiled as part of a Text-to-Speech data collection initiative for Duala. The textual source material consists of narratives about colonial encounters and experiences, which originally existed as oral and audio recordings and were later transcribed. The speaker read prepared passages drawn from these transcribed narratives, providing naturalistic and culturally grounded speech data. Recordings were made across multiple sessions and subsequently curated, deduplicated and aligned with their corresponding transcriptions.

Domain

The dataset consists of prompted read speech in Duala. The textual source material derives from transcribed oral narratives on colonial encounters and experiences, a historically and culturally significant genre in the Duala oral tradition. The recordings offer good prosodic and lexical diversity — including varied sentence lengths, clause structures, and narrative registers — making them suitable for TTS model training and ASR evaluation.

Size

Total size of MP3 audio: approx. 147 MB Total size of TSV mapping files: approx. 257 KB

Structure

This dataset comprises audio clips and audio/text mapping files organised across 16 recording sessions. There are 1,521 audio clips in MP3 format, totalling 4 hours, 30 minutes and 41.64 seconds of speech. The dataset includes 16 audio/text mapping files (mapping.tsv), each containing aligned audio/sentence pairs for the corresponding session, with 1,521 aligned pairs in total. Each TSV file contains the following fields: audio_filename, key, sentence, attempts.

Sample

Audio filename	Sentence
f909aa00d77ab80c4c13944135d02b2b.mp3	Ndé Ɓakálá ɓá Jáman ɓá pɔínɔ̄ na ɓaɓɔ́ ɓá wáná mɔ́?
ca81bf48716b3812c4611ad5bd1cc2ef.mp3	Hm. Níka o maɓolá eɓoló tɔ o sí maɓolá eɓoló o tá ndé ó Sáwá Tási ?
ee91b54b5325b54c205d97545b17c3b3.mp3	Di tá dí ɓɛ́nɛ́ ótên esukúlu á gɔ́bina mɔ́mɛ́nɛ́.
0c83dc83017caae2173ad24e79a604f5.mp3	Ee, madɔ́kita má tâ madɔ́kita má tâ, madɔ́kita máadi pɛ́ má tâ ó ɓepólo nyay na nyay.
022c4208667262a2b57956a149a17912.mp3	Na wonja ndé mɔ́ pɛ́ ndé mí e elɔ́ŋgi á ekombo. Mɔ́ ndé mí e melodie má ekombo.
e1731df56954dffbdbb8d41642218348.mp3	Ee, ɓá maɓolá ndé ekombo mɔní, ɓá maɓolá pɛ́ taɓako, níka mɔní mú sí tanɔ́ mú bíánɛ́ ɓwǎm ɓwam.
8879a8e4eb8ecba5b6b00fc1b5348817.mp3	O bí ná Ɓakálá ɓá tá ɓá ɗá súe jǐta.
c700a87e96cf2efecbf7004e9149ba65.mp3	O sí mawunja pɛ́? O titíná pɛ́ ó wunja pɛ́?
2cad390c37b051a8928106e3e498337a.mp3	Mǐndó, mí sí ta. Ɓá tá ndé Ɓakálá.
c8cc0e9e28badf3018a72093f6ea963e.mp3	Níka sɔ ndé á timbínɔ̄ sɔ dubisɛ mɔ́.