Sample Ngiemboon-TTS-Dataset

Description

Ngiemboon-TTS-Dataset is a scripted speech dataset dedicated to the documentation and technological development of Ngiemboon (ISO 639-3: nnh), a Grassfields Bantu language spoken in the Bamboutos Division of the West Region of Cameroon. The dataset was compiled in the framework of the Mozilla Data Collective initiative (2026), as a supplement to the Common Voice Scripted Speech 25.0 – Ngiemboon dataset (https://mozilladatacollective.com/datasets/cmn1qf3al00xzo107byg0pine). The dataset comprises 995 high-quality MP3 audio recordings of Ngiemboon sentences read by a native speaker across 10 recording sessions, together with per-session sentence-to-audio mapping files enabling precise alignment between textual and acoustic data. Sentences were drawn from a scripted speech prompt list and read in a controlled environment. The transcription of all sentences follows the General Alphabet of Cameroon's Languages (AGLC; French acronym: Alphabet Général des Langues Camerounaises), the reference standard for Cameroonian national languages. The Ngiemboon orthography employed in this dataset is distinguished by an extended vowel inventory — including the open-mid front unrounded vowel ɛ, the open-mid back rounded vowel ɔ, the high central rounded vowel ʉ, the high central unrounded vowel ɨ (barred i), and the close front rounded vowel ÿ — as well as a series of labialized consonants written by appending ẅ (w with diaeresis) to the base consonant (e.g., kẅ, gẅ, sẅ, zẅ, tsẅ), a multi-register tone-marking system combining level (acute, grave) and contour (caron, circumflex) diacritics applied to vowels and syllabic nasals, and the modifier letter apostrophe (ʼ) for glottal closure. The parallel availability of AGLC-transcribed text and aligned speech makes the dataset suitable for a wide range of applications, including text-to-speech (TTS) synthesis, automatic speech recognition (ASR), forced alignment, pronunciation modelling, and language learning tools. It also directly supports efforts to standardise and normalise the digital representation of Ngiemboon in language technology contexts.

Language

Ngiemboon (also written Ngyɛmbɔŋ) is a Grassfields Bantu language belonging to the Niger-Congo phylum, classified within the Mbam-Nkam branch. It is spoken in the Bamboutos Division of the West Region of Cameroon. Despite its sociolinguistic significance within Cameroon, Ngiemboon remains substantially underrepresented in language technology resources.

Variants

According to the Administrative Atlas of Cameroon's Languages (Breton & Bikia Fohtung 1991), Ngiemboon comprises the following dialects:

Balatchi
Bamoungong
Bangang
Batcham

Writing System

The writing system used for the transcription of Ngiemboon in this dataset is the General Alphabet of Cameroon's Languages (AGLC). The AGLC provides a phonologically motivated orthographic standard for Cameroonian national languages and serves as the reference framework for Ngiemboon literacy materials.

1. Vowels

The vowel system attested in the dataset includes the following oral vowels:

a, e, i, o, u, ɛ, ɔ, ʉ, ɨ, ÿ

Where:

ɛ (epsilon): open-mid front unrounded vowel
ɔ (open-o): open-mid back rounded vowel
ʉ (barred u): high central rounded vowel
ɨ (barred i): high central unrounded vowel
ÿ (y-diaeresis): close front rounded vowel

Long vowels are represented by vowel doubling (e.g., aa, ɛɛ, ɔɔ, uu, ii).

2. Consonants

The consonant inventory reflected in the dataset includes simple, digraph, and labialized consonants:

b, c, d, f, g, h, j, k, l, m, n, p, s, sh, t, ts, v, w, y, z, ŋ

Labialized consonants are formed by appending ẅ (w with diaeresis) to the base consonant or cluster:

kẅ: labialized velar stop
gẅ: labialized voiced velar stop
sẅ: labialized alveolar fricative
zẅ: labialized voiced alveolar fricative
tsẅ: labialized alveolar affricate
nzẅ, nkẅ: labialized nasal-consonant clusters

Special symbols:

ŋ (eng): velar nasal consonant
ẅ (w with diaeresis): labialization marker, appended to consonants
ʼ (modifier letter apostrophe): glottal stop / glottal closure marker

3. Syllabic nasals

Ngiemboon attests syllabic nasal consonants that function as tone-bearing units. The following tone-marked syllabic nasals are represented in the dataset:

ḿ (m with acute): syllabic bilabial nasal, high tone
ń (n with acute): syllabic alveolar nasal, high tone
ǹ (n with grave): syllabic alveolar nasal, low tone

4. Tone system

Ngiemboon is a tonal language with multiple contrastive pitch levels and contour tones. The dataset employs systematic tone marking on vowels and syllabic nasals in accordance with the AGLC convention. The following diacritics are attested in the dataset:

Level tones:

High tone (H): acute accent — á, é, í, ó, ú, ɛ́, ɔ́, ʉ́, ɨ́, ÿ́
Low tone (L): grave accent — à, è, ì, ò, ù, ɛ̀, ɔ̀, ʉ̀, ɨ̀, ÿ̀

Contour tones:

Falling tone (HL): circumflex — â, ê, î, ô, û, ɛ̂, ɔ̂, ʉ̂
Rising tone (LH): caron — ǎ, ě, ǐ, ǒ, ǔ, ɛ̌, ɔ̌, ÿ̌

Mid tone is generally left unmarked in the Ngiemboon AGLC orthography.

Source

The dataset was compiled from scripted speech prompt lists read by a native speaker. Sentences were selected to provide broad phonological coverage of Ngiemboon and were transcribed in accordance with the AGLC orthographic standard.

Domain

The dataset represents scripted speech in Ngiemboon, covering a broad range of everyday sentence types drawn from a general-purpose TTS/ASR prompt list. All utterances are scripted rather than spontaneous.

Size

Total audio duration: 7,280 seconds (02h 01m 20s), distributed across 995 MP3 audio clips in 10 recording sessions.

Structure

The dataset is organised into 10 recording sessions:

Session tts_dataset_nnh_01: 100 clips (14m 46s)
Session tts_dataset_nnh_02: 100 clips (16m 34s)
Session tts_dataset_nnh_03: 100 clips (10m 28s)
Session tts_dataset_nnh_04: 100 clips (11m 06s)
Session tts_dataset_nnh_05: 100 clips (14m 07s)
Session tts_dataset_nnh_06: 100 clips (13m 24s)
Session tts_dataset_nnh_07: 100 clips (10m 59s)
Session tts_dataset_nnh_08: 100 clips (09m 32s)
Session tts_dataset_nnh_09: 100 clips (07m 56s)
Session tts_dataset_nnh_10: 95 clips (12m 24s)

Each session folder contains:

MP3 audio clips
One per-session sentence-to-audio mapping file (mapping.tsv), with 4 columns

Description of columns (mapping.tsv)

#audio_filename: filename of the audio clip (MP3)
#key: unique hash identifier of the recording
#sentence: sentence text as read by the speaker, transcribed in AGLC orthography
#attempts: number of recording attempts before acceptance

Sample

audio file	sentence (Ngiemboon, AGLC)
d2d7174fe45cf83b8b89c49daad332aa.mp3	Menkẅɛ̌ ndá lezyéen fʉʼ ntsèm
ca2914db43399709f83e39928006a0b3.mp3	Lepǔ ḿbyág mentí fÿàg ntsèm jʉ̀' ntsèm, à fʉ̀' ntsém.
8ef3a54eaebd558ae8752a565abc1e2b.mp3	Atèmte ncwò lɔg ńdiŋ nkʉ̀a lezíŋ lɔgɔ ńgwɔ́ tsɔ̌ tàʼ na meliŋé menkʉ̀a mezíŋ métá
a5ed566a13cb0cc0b039b6d1d4ae1c4a.mp3	ḿbiŋ ńgʉa na menkàŋ, ńtsɔ́ʼ ntɔɔn saŋtí mítà 30
5c71748b491175f5477d39d686822ed9.mp3	Lezíŋ tá pa Lôŋtsyě ée le wɔ̌?