Adamawa Fulfulde-TTS-Dataset

Description

This dataset comprises 1,303 high-quality audio recordings of read speech produced by a single Adamawa Fulfulde speaker over several months. Adamawa Fulfulde (ISO 639-3: fub), also known as Fula Adamawa, is a language of the Niger-Congo family spoken in the Adamawa region of Cameroon and in adjacent areas of Chad and Nigeria. It is a low-resource language with limited existing digital speech resources, making this dataset a significant contribution to natural language processing efforts for the language. Audio files are provided in MP3 format (176 MB), totalling 3 hours, 31 minutes and 18.92 seconds of speech. The dataset includes an audio/sentence mapping file in TSV format (Mapping_MP3.tsv) containing 1,302 aligned audio/sentence pairs. Transcriptions follow the standard Fulfulde Latin-based orthography, which includes special characters such as ɓ, ɗ, ƴ, ŋ and ɲ, with vowel length marked by doubling (aa, oo, uu, etc.). Tones are not marked. The recordings cover a broad range of everyday topics and narrative genres — storytelling, dialogue, description and conversational interaction — providing good prosodic and lexical diversity for training and evaluating TTS and ASR models. The dataset is intended for research and scientific use in speech technology for Adamawa Fulfulde.

Language

Adamawa Fulfulde (ISO 639-3: fub) is a variety of Fulfulde spoken primarily in the Adamawa region of Cameroon and in adjacent areas of Chad and Nigeria. It belongs to the Niger-Congo language family and is part of the broader Fula dialect continuum. Adamawa Fulfulde has a rich system of noun classes marked by suffixal morphology, and it uses a modified Latin orthography that includes several special characters (e.g., ɓ, ɗ, ŋ, ƴ, ɲ) to represent sounds absent from the standard Latin alphabet.

Variants

The recordings in this dataset were produced by a single speaker of Adamawa Fulfulde from the Adamawa region of Cameroon. No sub-variety distinctions are encoded in the dataset, though the speaker's variety is representative of the Adamawa regional norm.

Alphabet

The orthography used in the transcription of audio recordings follows the standard Fulfulde writing conventions adopted for Adamawa Fulfulde, which builds on the Latin alphabet and includes the following additional characters: ɓ, ɗ, ƴ, ŋ, ɲ, ʼ. Vowel length is marked by doubling (e.g., aa, oo, uu, ee, ii).

Source

This dataset was compiled as part of a Text-to-Speech data collection initiative for Adamawa Fulfulde. Audio recordings were produced using a structured prompting methodology in which the speaker read prepared sentences and passages in Adamawa Fulfulde. The recordings were made over several months and subsequently curated, deduplicated and aligned with their corresponding transcriptions.

Domain

The dataset consists of prompted read speech in Adamawa Fulfulde. The texts cover a broad range of everyday topics and narrative genres, including storytelling, dialogue, description and conversational interaction, offering good prosodic and lexical diversity for TTS model training.

Size

Total size of MP3 audio: 176 MB

Structure

This dataset comprises audio clips and an audio/text mapping file. There are 1,303 audio clips in MP3 format, totalling 3 hours, 31 minutes and 18.92 seconds. The dataset includes one audio/text mapping file, Mapping_MP3.tsv, containing 1,302 aligned audio/sentence pairs.

Sample

00164611bb8a70b97b42d681514dab2d.mp3 Laamɗo kadi, a wii no yoo a yiɗaa debbo on, naa a semti laamɗo, laari ndaynuɗa mo naa?
00c8f1bad9e8f7b7364a5c7b6ff2d905.mp3 Hmm booɗɗum.
00de24224588692e0e94d4df8dba2776.mp3 Kadi boo o sakkini daande o wii :
0120d1d1a43e102e06b4671fe819cfbd.mp3 Takala mulus, takkaande mus
01365d94e3f176aa59c81749573e329d.mp3 baaba goo hokki mo limce, o wii : hokkoram puccu mi ƴamtonoyte nyamaande ndee.
015580bb191df49db97c9071c012f3be.mp3 Jam, caycayɗo goo hooci jiire bee gaaraaji mum, hooci, hooci, hooci haa ɓaawo dilli, yehi caka cak maayo nii ta'i gaaraaji goo, acci jiire goo nder ndiyam.
0183b32da92db698ce0a14cecce8bd39.mp3 o ɗon suɓta, hamman ɗon taada Mbuulu muuɗum.
01bbd04dc00c5a5de1874c5b3d146d8a.mp3 Suy o hooci o waati haa booro maako goo, booro maako boo jakan ɗon bee wurde. Debbo maako ummake ɗon dilla, kanko boo o wakkake booro maako bee jawngal gootal nder ton goo, o ɗon yaawa haa o huuca o wula o ƴakka kadi,
0227b8c7014b2ba5b9a07ca0136fe587.mp3 ɓe mbii : ɗoo nii teema goɗɗo ɓe ngasi ɓe ngasi ɓe tawi puldebbo goo,
0108230a534d321f05a7c4932f3c64f7.mp3 Yoo nde o hooci ɓinngel goo, o ɗon jogi, o ɗon jogi, o ɗon jogi haa ɓinngel mawni, haa waɗi enɗi nii, gorko wari muuyi ɗum ɓaŋi, dilliri haa lesdi ndaayiindi.