License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmqf69433066kmk07x03adrhk
Task: TTS
Release Date: 6/15/2026
Format: fmp
Size: 62.90 MB
Share
Fe'fe'-TTS-Dataset is a scripted speech dataset dedicated to the documentation and technological development of Fe'fe' (ISO 639-3: fmp), a Grassfields Bantu language spoken in the Haut Nkam Division of the Western Region of Cameroon. The dataset was compiled in the framework of the Mozilla Data Collective initiative (2026). The dataset comprises 1,004 high-quality MP3 audio recordings of Fe'fe' sentences read by a native speaker across 11 recording sessions, together with per-session sentence-to-audio mapping files enabling precise alignment between textual and acoustic data. Sentences were drawn from a scripted speech prompt list and read in a controlled environment. The transcription of all sentences follows the General Alphabet of Cameroon's Languages (AGLC; French acronym: Alphabet Général des Langues Camerounaises), the reference standard for Cameroonian national languages. The Fe'fe' orthography employed in this dataset is distinguished by a rich set of vowel symbols — including the central unrounded vowel α, the mid-central vowel ə, the high central rounded vowel ʉ, and the open-o ɔ — as well as a five-register tone-marking system combining level (acute, macron, grave) and contour (caron, circumflex, macron-acute, macron-grave) diacritics applied to all vowel symbols. Glottal closure is represented by the modifier letter apostrophe (ʼ) and the saltillo (ꞌ). The parallel availability of AGLC-transcribed text and aligned speech makes the dataset suitable for a wide range of applications, including text-to-speech (TTS) synthesis, automatic speech recognition (ASR), forced alignment, pronunciation modelling, and language learning tools. It also directly supports efforts to standardise and normalise the digital representation of Fe'fe' in language technology contexts.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
By downloading this dataset, you agree: - To use it for research and scientific use only - That you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: determining the identity of any speaker in the dataset; attempting to clone any voice or train models that imitate any speaker in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the legal owner of the dataset.
Intended Use
(a) Speech-related tasks: - Text-to-speech (TTS) synthesis: The dataset provides clean sentence–audio pairs from multiple recording sessions and is directly suited for training, fine-tuning, and evaluating speech synthesis models for Fe'fe'. The availability of AGLC-transcribed sentences with aligned audio enables the development of TTS systems capable of producing natural-sounding Fe'fe' speech. - Automatic speech recognition (ASR): Audio–text alignment enables the training and evaluation of speech recognition models for Fe'fe'. The per-session structure and controlled recording conditions make the dataset suitable for building and evaluating ASR models for this under-resourced language. - Speech–text alignment / forced alignment benchmarking: Fine-grained audio–text pairing provides ground truth for evaluating phoneme- or word-level aligners adapted to Grassfields Bantu languages. - Pronunciation modelling: The AGLC-transcribed sentences, combined with aligned audio, provide a resource for developing grapheme-to-phoneme (G2P) models and pronunciation lexicons for Fe'fe'. (b) Linguistic and lexicographic tasks: - Phonological analysis: The dataset enables systematic study of the phonological and tonal system of Fe'fe', including its complex multi-register tone system and the distribution of special vowels (α, ə, ʉ, ɔ) and contour tones. - Orthographic standardisation and normalisation: The dataset can serve as a reference corpus for evaluating and training text normalisation models aligned with the AGLC standard for Fe'fe'. - Language documentation: The dataset contributes to the digital documentation of Fe'fe' scripted speech in AGLC orthography, supporting efforts to extend the digital presence of this Grassfields Bantu language.
Fe'fe' (also written Ghomala') is a Grassfields Bantu language belonging to the Niger-Congo phylum, classified within the Mbam-Nkam (Eastern Grassfields) branch. It is spoken primarily in the Haut Nkam Division of the Western Region of Cameroon. According to Ethnologue (https://www.ethnologue.com/language/fmp/), Fe'fe' has a substantial speaker community in the Haut Nkam area. Despite its sociolinguistic significance within Cameroon, Fe'fe' remains substantially underrepresented in language technology resources.
According to the Administrative Atlas of Cameroon's Languages (Breton & Bikia Fohtung 1991), Fe'fe' comprises two geographically distributed dialectal areas:
North Fe'fe' dialectal area, comprising four dialect varieties:
La'fi dialect (Balafi)
Tuŋi dialect (Foutouni)
Nkwet dialect (Fondjomekwet)
Ntii dialect (Fondanti)
Central Fe'fe' dialectal area, comprising four dialect varieties:
Njəə-Poantu dialect (Bandja-Babountou)
Fa' dialect (Bafang)
Nka' dialect (Banka)
Nee dialect (Bana)
The writing system used for the transcription of Fe'fe' in this dataset is the General Alphabet of Cameroon's Languages (AGLC), as adopted by the Ministry of Basic Education of Cameroon and regularly updated by the Direction de la Promotion des Langues Nationales. The AGLC provides a phonologically motivated orthographic standard for Cameroonian national languages and serves as the reference framework for Fe'fe' literacy materials.
The vowel system attested in the dataset includes the following oral vowels:
a, e, i, o, u, α, ə, ʉ, ɔ
Where:
α (Latin alpha / IPA ɑ): central to back unrounded vowel, distinct from a
ə (schwa): mid-central vowel
ʉ (barred u): high central rounded vowel
ɔ (open-o): open-mid back rounded vowel
Long vowels are represented by vowel doubling (e.g., aa, αα, oo).
The consonant inventory reflected in the dataset includes simple, prenasalised, and digraph consonants:
b, c, d, f, g, h, j, k, l, m, n, p, s, t, v, w, y, z, ŋ
Digraphs and trigraphs: gh, ph, sh, th, zh, mb, mf, nd, ng, nk, nc, nsh, nzh
Special symbols:
ŋ (eng): velar nasal consonant
ʼ (modifier letter apostrophe) and ꞌ (saltillo): glottal stop / glottal closure marker
Fe'fe' is a tonal language with multiple contrastive pitch levels and contour tones. The dataset employs systematic tone marking on vowels in accordance with the AGLC convention. The following diacritics are attested in the dataset:
Level tones:
High tone (H): acute accent — á, é, í, ó, ú, ά, ə́, ʉ́, ɔ́
Mid tone (M): macron — ā, ē, ī, ō, ū, ᾱ, ə̄, ʉ̄, ɔ̄
Low tone (L): grave accent — à, è, ì, ò, ù, ὰ, ə̀, ʉ̀
Contour tones:
Rising tone (LH): caron — ǎ, ě, ǐ, ǒ, ǔ, α̌, ə̌, ʉ̌
Falling tone (HL): circumflex — â, ê, î, ô, û
Mid-rising tone (MH): macron-acute (᷇) — vowel᷇
Mid-falling tone (ML): macron-grave (᷆) — vowel᷆
Additional diacritics attested in the dataset include the combining dot below (̣), combining tilde below (̰), and combining diaeresis (̈), reflecting fine-grained phonological distinctions in the Fe'fe' sound system.
The dataset was compiled from scripted speech prompt lists read by native speakers of Fe'fe' in recording sessions held at the École Normale Supérieure de Yaoundé in June 2026, in the framework of the Mozilla Data Collective project. Sentences were selected to provide broad phonological coverage of Fe'fe' and were transcribed in accordance with the AGLC orthographic standard.
The dataset represents scripted speech in Fe'fe', covering a broad range of everyday sentence types drawn from a general-purpose TTS/ASR prompt list. All utterances are scripted rather than spontaneous.
Total audio duration: 4,291 seconds (01h 11m 31s), distributed across 1,004 MP3 audio clips in 11 recording sessions.
The dataset is organised into 11 recording sessions:
Session tts_dataset_fmp_01: 100 clips (07m 23s)
Session tts_dataset_fmp_02: 100 clips (06m 40s)
Session tts_dataset_fmp_03: 100 clips (06m 31s)
Session tts_dataset_fmp_04: 100 clips (06m 21s)
Session tts_dataset_fmp_05: 100 clips (06m 46s)
Session tts_dataset_fmp_06: 100 clips (06m 16s)
Session tts_dataset_fmp_07: 100 clips (06m 52s)
Session tts_dataset_fmp_08: 100 clips (07m 45s)
Session tts_dataset_fmp_09: 100 clips (08m 11s)
Session tts_dataset_fmp_10: 100 clips (08m 21s)
Session tts_dataset_fmp_11: 4 clips (00m 20s)
Each session folder contains:
MP3 audio clips (100 per session, except session fmp_11 which has 4 clips)
One per-session sentence-to-audio mapping file (mapping.tsv), with 4 columns
#audio_filename: filename of the audio clip (MP3)
#key: unique hash identifier of the recording
#sentence: sentence text as read by the speaker, transcribed in AGLC orthography
#attempts: number of recording attempts before acceptance
| audio file | sentence (Fe'fe', AGLC) |
|---|---|
| f9aef1fb2658f626ab1c65d78657d5a8.mp3 | pα̌h ntíé ŋwαʼni mα lamsák ghə̌lᾱʼ yǒh. |
| 2b6890ab12e79a408d793dc02e7735b7.mp3 | ngα̌ mʉngén tūꞌ nshi pí sēn a. |
| 4173d9385f866861fda2f354c9f1878d.mp3 | nsiesi lαhά mfα̌ꞌndʉ́ά mbí ghǎꞌŋwαꞌni lah mbά' mǒ' ngαα. |
| 354debd112b58435b0a77149389f70fb.mp3 | wúzά yá' phī tα pō ŋᾱꞌ ntʉ̄ᾱ sīē ά ghαα. |
| 3b58351703fd04a86a379a522d666168.mp3 | ngα̌ pén mbᾱ' ó nά ndhī ā ǒ hά le yáá pe'. |