License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmqj8pki201kgnq074r5teseq
Task: TTS
Release Date: 6/18/2026
Format: MP3, TSV
Size: 28.45 MB
Share
Sample-Ghomala-TTS-Dataset is a scripted speech dataset dedicated to the documentation and technological development of Ghomala (ISO 639-3: bbj), a Grassfields Bantu language spoken in the West Region of Cameroon. The dataset was compiled in the framework of the Mozilla Data Collective initiative (2026). The dataset comprises 997 high-quality audio recordings of Ghomala sentences read by a native speaker across 10 recording sessions (MP3 format), together with per-session sentence-to-audio mapping files enabling precise alignment between textual and acoustic data. Sentences were drawn from a scripted speech prompt list and read in a controlled environment. The transcription of all sentences follows the General Alphabet of Cameroon's Languages (AGLC; French acronym: Alphabet Général des Langues du Cameroun), the reference standard for Cameroonian national languages. The Ghomala orthography employed in this dataset is distinguished by an extended vowel inventory — including the schwa ə (mid central unrounded vowel), the open-mid front unrounded vowel ɛ, the open-mid back rounded vowel ɔ, and the high central rounded vowel ʉ (barred u), as well as the digraph aə functioning as a distinct complex vowel grapheme in Ghomala roots and affixes — alongside a rich consonant inventory comprising labio-velar stops (kp), labiodental obstruents (pf, bv), a velar fricative digraph (gh), postalveolar digraphs (sh, zh), affricates (c, ts, dz), labialized consonants (gw, gwy, cw), palatalised consonants (cy, shyə), and an extensive series of prenasalised and nasal-onset clusters (m+C and n+C, yielding forms such as mh, mn, mk, mgh, mf, mc, ms, mt, mzh, mj, mw, nt, nk, nw, nj, ny), as well as a multi-register tone-marking system combining level (acute, grave) and contour (caron, circumflex) diacritics applied to all nine vowels and to syllabic nasals, and the apostrophe (', U+0027) for marking glottal closure. The parallel availability of AGLC-transcribed text and aligned speech makes the dataset suitable for a wide range of applications, including text-to-speech (TTS) synthesis, automatic speech recognition (ASR), forced alignment, pronunciation modelling, and language learning tools. It also directly supports efforts to standardise and normalise the digital representation of Ghomala in language technology contexts.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
By downloading this dataset, you agree: - To use it for research and scientific use only - That you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: determining the identity of any speaker in the dataset; attempting to clone any voice or train models that imitate any speaker in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the legal owner of the dataset.
Intended Use
(a) Speech-related tasks: - Text-to-speech (TTS) synthesis: The dataset provides clean sentence–audio pairs from multiple recording sessions and is directly suited for training, fine-tuning, and evaluating speech synthesis models for Ghomala. The availability of AGLC-transcribed sentences with aligned audio enables the development of TTS systems capable of producing natural-sounding Ghomala speech. - Automatic speech recognition (ASR): Audio–text alignment enables the training and evaluation of speech recognition models for Ghomala. The per-session structure and controlled recording conditions make the dataset suitable for building and evaluating ASR models for this under-resourced language. Sessions 01–07 are explicitly designated for ASR use. - Speech–text alignment / forced alignment benchmarking: Fine-grained audio–text pairing provides ground truth for evaluating phoneme- or word-level aligners adapted to tonal Grassfields Bantu languages of the West Region of Cameroon. - Pronunciation modelling: The AGLC-transcribed sentences, combined with aligned audio, provide a resource for developing grapheme-to-phoneme (G2P) models and pronunciation lexicons for Ghomala, including for its complex consonant clusters and tonal system. (b) Linguistic and lexicographic tasks: - Phonological analysis: The dataset enables systematic study of the phonological and tonal system of Ghomala (jo parler), including its multi-register tone system, extended vowel inventory (ə, ɛ, ɔ, ʉ), the complex vowel digraph aə, rich consonant inventory (kp, pf, bv, gh, sh, zh, ts, dz, gw, gwy, c), and extensive nasal-onset cluster series (m+C, n+C). - Orthographic standardisation and normalisation: The dataset can serve as a reference corpus for evaluating and training text normalisation models aligned with the AGLC standard for Ghomala. It provides evidence for the use of the standard ASCII apostrophe (U+0027) as the glottal stop marker in Ghomala AGLC transcription. - Language documentation: The dataset contributes to the digital documentation of Ghomala scripted speech in AGLC orthography, supporting efforts to extend the digital presence of this Grassfields Bantu language of the West Region of Cameroon and to establish computational resources for one of the most widely spoken Bamiléké languages.
Ghomala (ISO 639-3: bbj) is a Grassfields Bantu language belonging to the Niger-Congo phylum, classified within the Mbam-Nkam branch of the Bantoid family. It is spoken in the West Region of Cameroon, predominantly in the département de la Mifi, and constitutes one of the largest Bamiléké languages in terms of speaker population. Despite its sociolinguistic significance within the Bamiléké cultural area of Cameroon, Ghomala remains substantially underrepresented in language technology resources.
According to the Administrative Atlas of Cameroon's Languages (Breton & Bikia Fohtung 1991), Ghomala comprises four dialect sub-areas:
ghɔmala-nord: parlers fʉ'sap (Bafoussam) and laŋ (Baleng)
ngemba (ghɔmala-ouest): parlers mugum (Bamugum), meka (Bameka) and mɔnjɔ (Bamenju)
ghɔmala-central: parlers jo (Bandjoun), we (Bahuan), hɔm (Baham) and yogam (Bayangam)
ghɔmala-sud: parlers tɛ' (Batiɛ), pa (Bapa) and denkwop (Badenkop)
The present dataset represents the ghɔmala-central sub-area, specifically the jo parler of Bandjoun.
The writing system used for the transcription of Ghomala in this dataset is the General Alphabet of Cameroon's Languages (AGLC). The AGLC provides a phonologically motivated orthographic standard for Cameroonian national languages and serves as the reference framework for Ghomala literacy materials.
The vowel system attested in the dataset includes the following oral vowels:
a, e, ə, ɛ, i, o, ɔ, u, ʉ
Where:
ə (schwa, U+0259): mid central unrounded vowel; one of the highest-frequency vowels in the dataset
ɛ (epsilon, U+025B): open-mid front unrounded vowel
ɔ (open-o, U+0254): open-mid back rounded vowel
ʉ (barred u, U+0289): high central rounded vowel
The digraph aə functions as a distinct complex vowel grapheme in Ghomala, frequently appearing in roots and affixes (e.g., gaə̂, gaə̂kə́, da'gaə́, maə́, yaə̌, pfaə̌). Tone diacritics are applied to either or both elements of this sequence as required.
Long vowels may be represented by vowel repetition in certain morphological contexts (e.g., áa as a sentence-final particle; pa'aa, ta'a in extended forms).
The consonant inventory reflected in the dataset includes simple, digraph, labialized, palatalised, and nasal-onset consonants:
b, bv, c, cw, cy, d, dj, dz, f, g, gh, gw, gwy, h, j, k, kh, kp, l, m, mb, n, nd, ng, nj, nk, nt, nw, ny, ŋ, p, pf, s, sh, t, ts, v, w, y, z, zh
Notable consonantal features:
gh: voiced velar fricative [ɣ] (e.g., ghɔ́, ghə, ghɔm)
sh: voiceless postalveolar fricative [ʃ] (e.g., shimnyə, shyə, shə́)
zh: voiced postalveolar fricative [ʒ] (e.g., zhʉ́zhʉ̂m, zhí'tə, mzhəŋ)
kp: voiceless labio-velar stop (e.g., wɔ́kpə, ŋkwítə́)
pf: voiceless labiodental affricate (e.g., pfaə̌, pfʉ́tə́, pfə̂)
bv: voiced labiodental fricative (e.g., bvʉ̂m, bvʉ)
ts: voiceless alveolar affricate (e.g., tsʉ', tsə)
dz: voiced alveolar affricate (e.g., dzʉ̂, fɔkdzʉ, dzə̀)
c: palatal/postalveolar affricate [tʃ] or [c] (e.g., cyə, cwə, cʉ')
gw, gwy: labialized voiced velar stop (e.g., gwyə̌, gwəp, gwp)
kh: aspirated or velar fricative (e.g., khʉ̂, kḥʉ)
cw, cy: labialized and palatalised affricates (e.g., cwǝlɔ̌, cyə̌pa')
Nasal-onset clusters constitute a distinctive structural feature of Ghomala. They are formed by combining the nasals m or n with a following consonant, yielding sequences such as: mh (mhɔgnə), mn (mnɔm, mnə́), mk (mkwɛ́nyə̀, mkámtə́), mgh (mghətsə́, mghɛ̌və́), mf (mfʉ̌), mc (mcʉ̀m, mco'), ms (msəku, msətùk), mt (mtap, mtɔ, mtâp), mzh (mzhəŋ, mzhinyə), mj (mjwǐ, mjyə̂), mw (mnwə, mwə), nt (ntǎknyə, ntɔ̌knyə́), nk (nkáp, nkwítə́), nw (nwə́), nj (njɔm). These clusters correspond to nasal onset consonant sequences in the Ghomala phonological system — including prenasalised stops and nasal-initial consonant clusters arising from the combination of nominal class prefixes with root-initial consonants — and appear throughout the dataset.
Special symbols:
ŋ (eng, U+014B): velar nasal consonant (e.g., ŋkáp, ŋwak, ŋwə̌)
' (apostrophe, U+0027): glottal stop / glottal closure marker (e.g., lá', cʉ', bɔ'ɔ, da'gaə́, bâ'ba', na'kə́tâm, zhi'tə)
Ghomala attests syllabic nasal consonants that function as independent phonological units and tone-bearing positions. The nasals m, n, and ŋ appear in syllabic function and may carry tone diacritics. The eng ŋ also functions as a nasal onset in clusters such as ŋk and ŋw.
Ghomala is a tonal language with multiple contrastive pitch levels and contour tones. The dataset employs systematic tone marking on vowels and syllabic nasals in accordance with the AGLC convention. The following diacritics are attested in the dataset:
Level tones:
High tone (H): acute accent — á, é, ə́, ɛ́, í, ó, ɔ́, ú, ʉ́
Low tone (L): grave accent — à, è, ə̀, ɛ̀, ì, ò, ɔ̀, ù, ʉ̀
Contour tones:
Falling tone (HL): circumflex — â, ê, ə̂, ɛ̂, î, ô, ɔ̂, û, ʉ̂
Rising tone (LH): caron — ǎ, ě, ə̌, ɛ̌, ǐ, ǒ, ɔ̌, ǔ, ʉ̌
Mid tone is generally left unmarked in the Ghomala AGLC orthography. Tone marking is systematic and applies to all nine vowels (a, e, ə, ɛ, i, o, ɔ, u, ʉ) as well as to syllabic nasals. Tone diacritics may accumulate within close vowel sequences and across morpheme boundaries.
The dataset was compiled from scripted speech prompt lists read by a native speaker of Ghomala (jo parler, Bandjoun). Sentences were selected to provide broad phonological coverage and were transcribed in accordance with the AGLC orthographic standard.
The dataset represents scripted speech in Ghomala (jo parler, Bandjoun variety), covering a broad range of everyday sentence types drawn from a general-purpose TTS/ASR prompt list. All utterances are scripted rather than spontaneous.
Total audio duration: 3,564 seconds (59m 24s), distributed across 997 accepted audio clips in 10 recording sessions (MP3 format). An additional 15 audio files (1m 07s) are retained in a dedicated subfolder within session tts_dataset_bbj_08, corresponding to rejected recording attempts. Total dataset size (audio + mapping files): approximately 60 MB.
The dataset is organised into 10 recording sessions:
Session asr-tts_dataset_bbj_01: 100 clips (3m 26s)
Session asr-tts_dataset_bbj_02: 98 clips (3m 22s)
Session asr-tts_dataset_bbj_03: 99 clips (3m 43s)
Session asr-tts_dataset_bbj_04: 100 clips (6m 16s)
Session asr-tts_dataset_bbj_05: 100 clips (6m 51s)
Session asr-tts_dataset_bbj_06: 100 clips (5m 27s)
Session asr-tts_dataset_bbj_07: 100 clips (6m 45s)
Session tts_dataset_bbj_08: 100 clips (7m 10s)
Session tts_dataset_bbj_09: 100 clips (8m 38s)
Session tts_dataset_bbj_10: 100 clips (7m 46s)
Sessions 01–07 are designated for both ASR and TTS use (prefix asr-tts_dataset_bbj); sessions 08–10 are designated for TTS use only (prefix tts_dataset_bbj).
Each session folder contains:
Audio clips (MP3 format)
One per-session sentence-to-audio mapping file (mapping.tsv), with 4 columns
Session tts_dataset_bbj_08 additionally contains an attempts subfolder holding 15 rejected recording takes (total duration: 1m 07s).
#audio_filename: filename of the audio clip (MP3)
#key: unique hash identifier of the recording
#sentence: sentence text as read by the speaker, transcribed in AGLC orthography
#attempts: number of recording attempts before acceptance
| audio file | sentence (Ghomala, AGLC) |
|---|---|
| f8e8e61ab4a258ade7423cecf7070d71.mp3 | Wɔ́kpə wɛ́ gɔ ghɔ́ lá' |
| bc738a28dd4e04f3197fa3f52974a5a3.mp3 | Pfaə̌ byâtà cʉ́m bɛ́ |
| 0a6b821936241f07e629230ecf7a5503.mp3 | zhi'tə guŋ á Jo |
| 48f9f388280da7e4d27bef3894aa850f.mp3 | Gɛ̀là'tə̀ ŋwak səkú |
| aa4e0a61749423f02ac7260f5442edfa.mp3 | Á wə cə́ŋ ghə́ ghɔm bǐ lɔ́yà. |
| 453ccff43fc44163c177d6536ac00028.mp3 | Zhʉ́zhʉ̂m bə́ kə́ ? |
| f773c536b8c53caffab688e8860c8ea1.mp3 | Mghɛ̌və́ wə́ shimnyə pǒm səku mfʉ̌ puá bɔ'ɔ nə́ kam fa' |
| ca3876772fc304b6662768d2dfc9afc7.mp3 | Pə́ ǒ gɔ kwipnyə yəŋ ma |
| 2282e5caff7d5de5f882a5edc98abfde.mp3 | Tə́ da'gaə́ é nɔ̂k é bə́ tə́ ghəm, bə́ á gɔ cʉ' pə́ ywə dyɛ' |