License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmqf6c9lc06gil2075rm34xds
Task: TTS
Release Date: 6/15/2026
Format: MP3, TSV
Size: 43.03 MB
Share
Mbo-TTS-Dataset is a scripted speech dataset dedicated to the documentation and technological development of Mbo (ISO 639-3: mbo), a Bantu language spoken in the Moungo Division of the Littoral Region of Cameroon. The dataset was compiled in the framework of the Mozilla Data Collective initiative (2026), as a supplement to the Common Voice Scripted Speech 25.0 – Mbo dataset (https://mozilladatacollective.com/datasets/cmn1qc3ct00zemm07h05b4qls). The dataset comprises 982 high-quality MP3 audio recordings of Mbo sentences read by a native speaker across 10 recording sessions, together with per-session sentence-to-audio mapping files enabling precise alignment between textual and acoustic data. Sentences were drawn from a scripted speech prompt list and read in a controlled environment. The transcription of all sentences follows the General Alphabet of Cameroon's Languages (AGLC; French acronym: Alphabet Général des Langues Camerounaises), the reference standard for Cameroonian national languages. The Mbo orthography employed in this dataset is distinguished by a rich set of vowel symbols — including the open-mid front unrounded vowel ɛ, the open-mid back rounded vowel ɔ, the mid-central vowel ə (schwa), and the high central rounded vowel ʉ — as well as a multi-register tone-marking system combining level (acute, macron, grave) and contour (caron, circumflex) diacritics applied to all vowel symbols and syllabic nasals. A voiced bilabial implosive consonant is represented by ɓ. Glottal closure is marked by the modifier letter apostrophe (ʼ). The parallel availability of AGLC-transcribed text and aligned speech makes the dataset suitable for a wide range of applications, including text-to-speech (TTS) synthesis, automatic speech recognition (ASR), forced alignment, pronunciation modelling, and language learning tools. It also directly supports efforts to standardise and normalise the digital representation of Mbo in language technology contexts.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
By downloading this dataset, you agree: - To use it for research and scientific use only - That you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: determining the identity of any speaker in the dataset; attempting to clone any voice or train models that imitate any speaker in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the legal owner of the dataset.
Intended Use
(a) Speech-related tasks: - Text-to-speech (TTS) synthesis: The dataset provides clean sentence–audio pairs from multiple recording sessions and is directly suited for training, fine-tuning, and evaluating speech synthesis models for Mbo. The availability of AGLC-transcribed sentences with aligned audio enables the development of TTS systems capable of producing natural-sounding Mbo speech. - Automatic speech recognition (ASR): Audio–text alignment enables the training and evaluation of speech recognition models for Mbo. The per-session structure and controlled recording conditions make the dataset suitable for building and evaluating ASR models for this under-resourced language. - Speech–text alignment / forced alignment benchmarking: Fine-grained audio–text pairing provides ground truth for evaluating phoneme- or word-level aligners adapted to Bantu languages of the Moungo area. - Pronunciation modelling: The AGLC-transcribed sentences, combined with aligned audio, provide a resource for developing grapheme-to-phoneme (G2P) models and pronunciation lexicons for Mbo. (b) Linguistic and lexicographic tasks: - Phonological analysis: The dataset enables systematic study of the phonological and tonal system of Mbo, including its multi-register tone system and the distribution of special vowels (ɛ, ɔ, ə, ʉ), implosive consonants (ɓ), and contour tones. - Orthographic standardisation and normalisation: The dataset can serve as a reference corpus for evaluating and training text normalisation models aligned with the AGLC standard for Mbo. - Language documentation: The dataset contributes to the digital documentation of Mbo scripted speech in AGLC orthography, supporting efforts to extend the digital presence of this Bantu language of the Littoral Region of Cameroon.
Mbo is a Bantu language belonging to the Niger-Congo phylum, classified within the Mbam-Nkam branch. It is spoken primarily in the Moungo Division of the Littoral Region of Cameroon. Despite its sociolinguistic significance within Cameroon, Mbo remains substantially underrepresented in language technology resources.
According to the Administrative Atlas of Cameroon's Languages (Breton & Bikia Fohtung 1991), Mbo comprises the following dialects:
Bonkeŋ
Central-Mbo
Ehɔw
Mba
Ehɔ Mbo
Alɛ mbuu
Bakem
The writing system used for the transcription of Mbo in this dataset is the General Alphabet of Cameroon's Languages (AGLC), as adopted by the Ministry of Basic Education of Cameroon and regularly updated by the Direction de la Promotion des Langues Nationales. The AGLC provides a phonologically motivated orthographic standard for Cameroonian national languages and serves as the reference framework for Mbo literacy materials.
The vowel system attested in the dataset includes the following oral vowels:
a, e, i, o, u, ɛ, ɔ, ə, ʉ
Where:
ɛ (epsilon): open-mid front unrounded vowel
ɔ (open-o): open-mid back rounded vowel
ə (schwa): mid-central vowel
ʉ (barred u): high central rounded vowel
Long vowels are represented by vowel doubling (e.g., aa, ɛɛ, ɔɔ, əə).
The consonant inventory reflected in the dataset includes simple and digraph consonants:
b, c, d, f, g, h, j, k, l, m, n, p, s, sh, t, v, w, y, z, ŋ, ɓ
Special symbols:
ŋ (eng): velar nasal consonant
ɓ (b with hook): voiced bilabial implosive consonant
ʼ (modifier letter apostrophe): glottal stop / glottal closure marker
Mbo attests syllabic nasal consonants that function as tone-bearing units. The following tone-marked syllabic nasals are represented in the dataset:
ḿ (m with acute): syllabic bilabial nasal, high tone
m̀ (m with grave): syllabic bilabial nasal, low tone
ń (n with acute): syllabic alveolar nasal, high tone
ǹ (n with grave): syllabic alveolar nasal, low tone
ŋ̀ (eng with grave): syllabic velar nasal, low tone
Mbo is a tonal language with multiple contrastive pitch levels and contour tones. The dataset employs systematic tone marking on vowels and syllabic nasals in accordance with the AGLC convention. The following diacritics are attested in the dataset:
Level tones:
High tone (H): acute accent — á, é, í, ó, ú, ɛ́, ɔ́, ə́, ʉ́
Mid tone (M): macron — ā, ē, ī, ō, ū, ɛ̄, ɔ̄, ə̄, ʉ̄
Low tone (L): grave accent — à, è, ì, ò, ù, ɛ̀, ɔ̀, ə̀, ʉ̀
Contour tones:
Rising tone (LH): caron — ǎ, ě, ǐ, ǒ, ǔ, ɛ̌, ɔ̌, ə̌
Falling tone (HL): circumflex — â, ê, î, ô, û, ɛ̂, ə̂
Additional diacritics attested in the dataset include the combining tilde below (̰), reflecting fine-grained phonological distinctions in the Mbo sound system.
The dataset was compiled from scripted speech prompt lists read by native speakers of Mbo in recording sessions held at the École Normale Supérieure de Yaoundé in June 2026, in the framework of the Mozilla Data Collective project. Sentences were selected to provide broad phonological coverage of Mbo and were transcribed in accordance with the AGLC orthographic standard.
The dataset represents scripted speech in Mbo, covering a broad range of everyday sentence types drawn from a general-purpose TTS/ASR prompt list. All utterances are scripted rather than spontaneous.
Total audio duration: 3,851 seconds (01h 04m 11s), distributed across 982 MP3 audio clips in 10 recording sessions.
The dataset is organised into 10 recording sessions:
Session tts_dataset_mbo_01: 89 clips (06m 09s)
Session tts_dataset_mbo_02: 100 clips (06m 16s)
Session tts_dataset_mbo_03: 100 clips (05m 25s)
Session tts_dataset_mbo_04: 100 clips (05m 19s)
Session tts_dataset_mbo_05: 100 clips (07m 39s)
Session tts_dataset_mbo_06: 99 clips (07m 20s)
Session tts_dataset_mbo_07: 99 clips (06m 41s)
Session tts_dataset_mbo_08: 100 clips (06m 46s)
Session tts_dataset_mbo_09: 100 clips (07m 09s)
Session tts_dataset_mbo_10: 95 clips (05m 22s)
Each session folder contains:
MP3 audio clips
One per-session sentence-to-audio mapping file (mapping.tsv), with 4 columns
#audio_filename: filename of the audio clip (MP3)
#key: unique hash identifier of the recording
#sentence: sentence text as read by the speaker, transcribed in AGLC orthography
#attempts: number of recording attempts before acceptance
| audio file | sentence (Mbo, AGLC) |
|---|---|
| af0a03b92a99e7d70a08d43b2c8f192a.mp3 | Akɔŋki a kolɛɛ étóó mbwá. |
| 639d005d79582b79cb060631f3afa4b9.mp3 | Ŋkɛn ni dyam abum, nlóŋ ni dyam abum. |
| 3284ba083cb51567e069ba56fc381c9b.mp3 | Nzɛ́ɛ́ ní wômpɛ mi gwɛ́ ibɔnkí nɛ́ m̀ pə́ə́tɛ́ a bɔti |
| eb61ac69c76f581ee079d9f46fffff03.mp3 | M̀pʉ́ kasɛ́lɛ nɛ́ í gwɛɛ |
| 809e6d474c1778d825e91576d58787eb.mp3 | Síní butɛ kɛ́' a byê bóni ŋ̀kʉtɛ ǹsa' |