License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmnhjbnjp0115mh07kiha0rei
Task: TTS
Release Date: 4/2/2026
Format: MP3, TSV
Size: 219.97 MB
Share
This dataset comprises audio recordings of Bamun (Shupamem) speech aligned with textual transcriptions. The dataset is structured into 24 folders totalling 4h 30m 25s, each containing audio files and a corresponding audio-text mapping file. The audio clips are short, typically ranging from 1 to 10 seconds, and are suitable for training and evaluating Text-to-Speech (TTS) systems. The dataset follows a structured format where each audio file is paired with its corresponding transcription in a tab-separated mapping file. The textual content used in this dataset originates from transcriptions of oral narratives documenting personal histories related to German colonisation in Cameroon. These texts were segmented into short utterances suitable for read speech and TTS modelling.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
- For research and scientific use only - You agree not to re-host or redistribute this dataset
Forbidden Usage
- Generative AI - Voice cloning or speaker imitation - Reproduction, duplication, modification, or redistribution - Commercial use without explicit permission
Intended Use
This dataset is intended for the training and evaluation of Text-to-Speech (TTS) systems for the Bamun language. It aims to support: - Language revitalisation - Development of speech technologies for under-served African languages - Educational applications in multilingual contexts
Bamun or Shüpamom/Shupamem is a Bantu-Grassfield language spoken in the Noun Division, West Region in Cameroon.
The Bamun language is quite homogeneous within their indigenous territory, the Noun Administrative Division. However, the Administrative Atlas of Cameroon's Languages (Breton and Bikia Fohtung, 1991) indicates a few "islands" outside the Noun Department where the Bamun language exhibits minor variations. These include Bapi in the Mifi Division in the West Region and Bamalang and Bangolan in the Mezam Division in the Northwest Region.
The vowel inventory reflected in the dataset is: i, e, ɛ, a, ɔ, o, u, ʉ, ə
The vowel ə / ә is particularly frequent and functions as a central vowel.
The consonant system includes the following simple consonants: b, d, f, g, h, j, k, l, m, n, ŋ, p, r, s, t, v, w, y, z
Complex and cluster-like consonants attested include: mb, nd, nk, ng, nt, nj, mf, kp
Digraphs: sh, gh
The transcription encodes lexical tone using diacritics, corresponding to standard tonal categories:
High tone (H): marked with acute accent (á, é, ɔ́, ʉ́, ŋ́)
Low tone (L): marked with grave accent (à, è, ɔ̀)
Mid tone (M): marked with macron (ā, ē)
Rising tone (LH): marked with caron (ǎ, ě, ɔ̌)
Falling tone (HL): marked with circumflex (â, ê)
This dataset originates from audio recordings documenting personal histories of German colonisation. These recordings were made in the early eighties as part of a research project led by Prince (Professor) Koum A Ndoumbe III.
Abdou Salam Ntieche Fifen created the transcriptions associated with this dataset. The transcriptions were made in 2017.
For the purpose of creating this dataset, the textual material was segmented into short utterances and aligned with corresponding audio recordings to support TTS modelling.
This dataset is derived from prompted speech in the form of directed interviews. The content reflects personal narratives related to colonial history in Cameroon.
The dataset has been transformed into read-style segmented speech suitable for speech synthesis tasks.
219.97 MB The dataset is composed of 24 folders containing audio clips and corresponding mapping files.
Each folder contains between approximately 10 and 280 audio files. Individual audio clips typically range from 1 to 10 seconds in duration.
Folder-level durations range from approximately 1 minute to over 18 minutes of audio. The dataset therefore represents several hours of segmented Bamun speech data.
The total duration of the recording is 4h 30m 25s.
The dataset is composed of 24 folders containing audio clips and corresponding mapping files.
Each folder in the dataset contains:
A collection of audio files in MP3 format, between approximately 10 and 280 audio files. Individual audio clips typically range from 1 to 10 seconds in duration.
A tab-separated mapping file linking each audio file to its transcription
Folder-level durations range from approximately 1 minute to over 18 minutes of audio. The dataset therefore represents several hours of segmented Bamun speech data.
The total duration of the recording is 4h 30m 25s.
Each line in the mapping file follows the format:
audio_filename.mp3 | transcription
The dataset is designed for TTS pipelines requiring paired audio-text data.
03246844d87f5a76ec4fc1f636626bb5.mp3 | Euh mí u tóóshә́ ŋwәt ru
14289dc77904b3edb98afcfbb5776ee1.mp3 | Í nzie Li shá?
2579e22b2b248815938631969ae22200.mp3 | Li shú, nә nguu yúá
33380524f516027e3b6acad30c6a4f0f.mp3 | Ndǔ lʉ́m mú
364c11dd9cc6d98ae53d9fca5ef0b374.mp3 | Mbúá' NJI FIFEN