License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmp0f3tib02a4mp07ch5tr808
Task: TTS
Release Date: 5/10/2026
Format: MP3, TSV
Size: 169.27 MB
Share
This dataset comprises 1,303 high-quality audio recordings of read speech produced by a single Adamawa Fulfulde speaker over several months. Adamawa Fulfulde (ISO 639-3: fub), also known as Fula Adamawa, is a language of the Niger-Congo family spoken in the Adamawa region of Cameroon and in adjacent areas of Chad and Nigeria. It is a low-resource language with limited existing digital speech resources, making this dataset a significant contribution to natural language processing efforts for the language. Audio files are provided in MP3 format (176 MB), totalling 3 hours, 31 minutes and 18.92 seconds of speech. The dataset includes an audio/sentence mapping file in TSV format (Mapping_MP3.tsv) containing 1,302 aligned audio/sentence pairs. Transcriptions follow the standard Fulfulde Latin-based orthography, which includes special characters such as ɓ, ɗ, ƴ, ŋ and ɲ, with vowel length marked by doubling (aa, oo, uu, etc.). Tones are not marked. The recordings cover a broad range of everyday topics and narrative genres — storytelling, dialogue, description and conversational interaction — providing good prosodic and lexical diversity for training and evaluating TTS and ASR models. The dataset is intended for research and scientific use in speech technology for Adamawa Fulfulde.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
- For research and scientific use only - You agree that you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: determining the identity of the speakers in the dataset; attempt to clone the voice or train models that imitate the speakers in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the legal owner of the dataset.
Intended Use
The dataset is suitable for speech-related tasks, in particular Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) for Adamawa Fulfulde. The audio-text alignment in this dataset enables speech synthesis and speech recognition models to be trained or evaluated for the development of more inclusive and representative TTS and ASR tools for Adamawa Fulfulde, a low-resource African language with limited existing digital speech resources.
Adamawa Fulfulde (ISO 639-3: fub) is a variety of Fulfulde spoken primarily in the Adamawa region of Cameroon and in adjacent areas of Chad and Nigeria. It belongs to the Niger-Congo language family and is part of the broader Fula dialect continuum. Adamawa Fulfulde has a rich system of noun classes marked by suffixal morphology, and it uses a modified Latin orthography that includes several special characters (e.g., ɓ, ɗ, ŋ, ƴ, ɲ) to represent sounds absent from the standard Latin alphabet.
The recordings in this dataset were produced by a single speaker of Adamawa Fulfulde from the Adamawa region of Cameroon. No sub-variety distinctions are encoded in the dataset, though the speaker's variety is representative of the Adamawa regional norm.
The orthography used in the transcription of audio recordings follows the standard Fulfulde writing conventions adopted for Adamawa Fulfulde, which builds on the Latin alphabet and includes the following additional characters: ɓ, ɗ, ƴ, ŋ, ɲ, ʼ. Vowel length is marked by doubling (e.g., aa, oo, uu, ee, ii).
This dataset was compiled as part of a Text-to-Speech data collection initiative for Adamawa Fulfulde. Audio recordings were produced using a structured prompting methodology in which the speaker read prepared sentences and passages in Adamawa Fulfulde. The recordings were made over several months and subsequently curated, deduplicated and aligned with their corresponding transcriptions.
The dataset consists of prompted read speech in Adamawa Fulfulde. The texts cover a broad range of everyday topics and narrative genres, including storytelling, dialogue, description and conversational interaction, offering good prosodic and lexical diversity for TTS model training.
Total size of MP3 audio: 176 MB
This dataset comprises audio clips and an audio/text mapping file. There are 1,303 audio clips in MP3 format, totalling 3 hours, 31 minutes and 18.92 seconds. The dataset includes one audio/text mapping file, Mapping_MP3.tsv, containing 1,302 aligned audio/sentence pairs.
00164611bb8a70b97b42d681514dab2d.mp3 Laamɗo kadi, a wii no yoo a yiɗaa debbo on, naa a semti laamɗo, laari ndaynuɗa mo naa?
00c8f1bad9e8f7b7364a5c7b6ff2d905.mp3 Hmm booɗɗum.
00de24224588692e0e94d4df8dba2776.mp3 Kadi boo o sakkini daande o wii :
0120d1d1a43e102e06b4671fe819cfbd.mp3 Takala mulus, takkaande mus
01365d94e3f176aa59c81749573e329d.mp3 baaba goo hokki mo limce, o wii : hokkoram puccu mi ƴamtonoyte nyamaande ndee.
015580bb191df49db97c9071c012f3be.mp3 Jam, caycayɗo goo hooci jiire bee gaaraaji mum, hooci, hooci, hooci haa ɓaawo dilli, yehi caka cak maayo nii ta'i gaaraaji goo, acci jiire goo nder ndiyam.
0183b32da92db698ce0a14cecce8bd39.mp3 o ɗon suɓta, hamman ɗon taada Mbuulu muuɗum.
01bbd04dc00c5a5de1874c5b3d146d8a.mp3 Suy o hooci o waati haa booro maako goo, booro maako boo jakan ɗon bee wurde. Debbo maako ummake ɗon dilla, kanko boo o wakkake booro maako bee jawngal gootal nder ton goo, o ɗon yaawa haa o huuca o wula o ƴakka kadi,
0227b8c7014b2ba5b9a07ca0136fe587.mp3 ɓe mbii : ɗoo nii teema goɗɗo ɓe ngasi ɓe ngasi ɓe tawi puldebbo goo,
0108230a534d321f05a7c4932f3c64f7.mp3 Yoo nde o hooci ɓinngel goo, o ɗon jogi, o ɗon jogi, o ɗon jogi haa ɓinngel mawni, haa waɗi enɗi nii, gorko wari muuyi ɗum ɓaŋi, dilliri haa lesdi ndaayiindi.