Sample Fe’fe’-TTS-Dataset

Description

Fe'fe'-TTS-Dataset is a scripted speech dataset dedicated to the documentation and technological development of Fe'fe' (ISO 639-3: fmp), a Grassfields Bantu language spoken in the Haut Nkam Division of the Western Region of Cameroon. The dataset was compiled in the framework of the Mozilla Data Collective initiative (2026). The dataset comprises 1,004 high-quality MP3 audio recordings of Fe'fe' sentences read by a native speaker across 11 recording sessions, together with per-session sentence-to-audio mapping files enabling precise alignment between textual and acoustic data. Sentences were drawn from a scripted speech prompt list and read in a controlled environment. The transcription of all sentences follows the General Alphabet of Cameroon's Languages (AGLC; French acronym: Alphabet Général des Langues Camerounaises), the reference standard for Cameroonian national languages. The Fe'fe' orthography employed in this dataset is distinguished by a rich set of vowel symbols — including the central unrounded vowel α, the mid-central vowel ə, the high central rounded vowel ʉ, and the open-o ɔ — as well as a five-register tone-marking system combining level (acute, macron, grave) and contour (caron, circumflex, macron-acute, macron-grave) diacritics applied to all vowel symbols. Glottal closure is represented by the modifier letter apostrophe (ʼ) and the saltillo (ꞌ). The parallel availability of AGLC-transcribed text and aligned speech makes the dataset suitable for a wide range of applications, including text-to-speech (TTS) synthesis, automatic speech recognition (ASR), forced alignment, pronunciation modelling, and language learning tools. It also directly supports efforts to standardise and normalise the digital representation of Fe'fe' in language technology contexts.

audio file	sentence (Fe'fe', AGLC)
f9aef1fb2658f626ab1c65d78657d5a8.mp3	pα̌h ntíé ŋwαʼni mα lamsák ghə̌lᾱʼ yǒh.
2b6890ab12e79a408d793dc02e7735b7.mp3	ngα̌ mʉngén tūꞌ nshi pí sēn a.
4173d9385f866861fda2f354c9f1878d.mp3	nsiesi lαhά mfα̌ꞌndʉ́ά mbí ghǎꞌŋwαꞌni lah mbά' mǒ' ngαα.
354debd112b58435b0a77149389f70fb.mp3	wúzά yá' phī tα pō ŋᾱꞌ ntʉ̄ᾱ sīē ά ghαα.
3b58351703fd04a86a379a522d666168.mp3	ngα̌ pén mbᾱ' ó nά ndhī ā ǒ hά le yáá pe'.

audio file

sentence (Fe'fe', AGLC)

f9aef1fb2658f626ab1c65d78657d5a8.mp3

pα̌h ntíé ŋwαʼni mα lamsák ghə̌lᾱʼ yǒh.

2b6890ab12e79a408d793dc02e7735b7.mp3

ngα̌ mʉngén tūꞌ nshi pí sēn a.

4173d9385f866861fda2f354c9f1878d.mp3

nsiesi lαhά mfα̌ꞌndʉ́ά mbí ghǎꞌŋwαꞌni lah mbά' mǒ' ngαα.

354debd112b58435b0a77149389f70fb.mp3

wúzά yá' phī tα pō ŋᾱꞌ ntʉ̄ᾱ sīē ά ghαα.

3b58351703fd04a86a379a522d666168.mp3

ngα̌ pén mbᾱ' ó nά ndhī ā ǒ hά le yáá pe'.

Description

Specifics

Considerations

Processes

Metadata

Language

Variants

Writing System

1. Vowels

2. Consonants

3. Tone system

Source

Domain

Size

Structure

Description of columns (mapping.tsv)

Sample