License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmq9hxeod02gjmk07d9pq7jau
Task: ASR
Release Date: 6/11/2026
Format: MP3, TSV
Size: 117.51 MB
Share
Ewondo-ASR-Dataset is a scripted speech dataset dedicated to the documentation and technological development of Ewondo (ISO 639-3: ewo), a Narrow Bantu language spoken primarily in the Centre, South and East Regions of Cameroon, where it also functions as a vehicular language. The dataset was compiled at the École Normale Supérieure de Yaoundé with contribution from students. The dataset comprises 1,781 high-quality MP3 audio recordings of Ewondo sentences read by 16 native speakers across 19 recording sessions, together with per-session sentence-to-audio mapping files enabling precise alignment between textual and acoustic data. Sentences were drawn from a scripted speech prompt list and read by each speaker in a controlled environment. The primary added value of this dataset lies in its orthographic alignment with the General Alphabet of Cameroon's Languages (AGLC; French acronym: AGLC — Alphabet Général des Langues Camerounaises), the reference standard for Cameroonian national languages. In particular, this dataset preserves systematic tone marking, a feature that the existing Common Voice Scripted Speech 25.0 – Ewondo dataset available on the Mozilla Data Collective platform tends to omit. By making tone information explicit in the transcription, this dataset enables the development and evaluation of speech technology models that are sensitive to the tonal contrasts that are phonemically contrastive in Ewondo. From a methodological perspective, the dataset is designed to complement the existing Common Voice Scripted Speech resource for Ewondo rather than to replace it, thereby extending the total amount of available Ewondo speech data aligned with an orthographically principled transcription standard. The parallel availability of AGLC-transcribed text and aligned speech makes the dataset suitable for a wide range of applications, including automatic speech recognition (ASR), text-to-speech (TTS), forced alignment, pronunciation modelling and language learning tools. It also directly supports efforts to standardise and normalise the digital representation of Ewondo in language technology contexts.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
By downloading this dataset, you agree: - To use it for research and scientific use only - That you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: determining the identity of any speaker in the dataset; attempting to clone any voice or train models that imitate any speaker in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the legal owner of the dataset.
Intended Use
(a) Speech-related tasks: - Automatic speech recognition (ASR): Audio–text alignment enables the evaluation of speech recognition models for Ewondo. Sentences are transcribed in the General Alphabet of Cameroon's Languages (AGLC) with full tone marking, which distinguishes this dataset from the Common Voice Scripted Speech 25.0 – Ewondo resource and makes it particularly suited for building and evaluating tone-aware ASR models. - Text-to-speech (TTS): The dataset contains clean sentence–audio pairs from multiple speakers and can be used to evaluate or fine-tune speech synthesis models for Ewondo. The AGLC orthographic standard, including tone diacritics, should be taken into account when designing TTS experiments. - Speech–text alignment / forced alignment benchmarking: Fine-grained audio–text pairing provides ground truth for evaluating phoneme- or word-level aligners adapted to tonal Bantu languages. (b) Translation and multilingual tasks: - Speech translation (speech-to-text) (c) Linguistic and lexicographic tasks: - Phonological and tonal analysis: The systematic tone notation in AGLC orthography makes the dataset suitable for studying tonal alternations, downstep, floating tones and other phonological phenomena in Ewondo. - Orthographic standardisation and normalisation: The dataset can serve as a reference corpus for evaluating and training text normalisation and grapheme-to-phoneme (G2P) models aligned with the AGLC standard. - Language documentation: The dataset contributes to the digital documentation of Ewondo scripted speech in AGLC orthography, extending the existing Common Voice resource with orthographically principled transcriptions.
Ewondo is a Narrow Bantu language belonging to the Beti-Fang group of the Benue-Congo branch. It is indigenous to a population located primarily in the Centre Region of Cameroon, with significant speech communities in the South and East Regions. Ewondo also functions as a vehicular language in those regions and has given rise to a creolised variety known as Mongo Ewondo. Ethnologue estimates the number of speakers at approximately 900,000, including first- and second-language users. Despite its relatively large speaker base, Ewondo remains significantly underrepresented in language technology resources.
The glossonym 'Ewondo' designates a set of closely related linguistic varieties whose speakers may or may not identify with the label, depending on geographical, social and pragmatic factors. In the framework of the Atlas Linguistique du Cameroun (ALCAM) project, Ewondo is listed as one of the major micro-languages of the Beti-Fang macro-language, alongside Fang, Bulu, Ntumu and Eton. Varieties such as Yezoum (Haut-Nyong Division), Yanda and Moog-Ebanda are considered sub-varieties of Ewondo in standard classifications, though this classification is not always accepted by their speakers.
The present dataset represents speakers of the Ewondo variety as spoken in the Yaoundé area (Centre Region), recruited at the École Normale Supérieure de Yaoundé.
The writing system used for the transcription of Ewondo in this dataset is the General Alphabet of Cameroon's Languages (AGLC), as adopted by the Ministry of Basic Education of Cameroon and regularly updated by the Direction de la Promotion des Langues Nationales. The AGLC provides a phonologically motivated orthographic standard for Cameroonian national languages and serves as the reference framework for Ewondo literacy materials, including those produced by the Catholic and Protestant missionary traditions that have subsequently aligned with this standard.
The vowel system attested in the dataset includes the following oral vowels:
a, e, ə, i, o, u, ɔ
Long vowels are represented by vowel doubling (e.g. aa, ee, oo).
The consonant inventory reflected in the dataset includes simple, prenasalized and digraph consonants:
b, d, dz, f, g, h, k, l, m, mb, mv, n, nd, ng, nk, nz, ny, ŋ, p, s, t, ts, v, w, y, z
Special symbols: ə (mid central vowel), ŋ (velar nasal)
Ewondo is a tonal language with lexical and grammatical contrastive tones. The dataset employs systematic tone marking on vowels in accordance with the AGLC convention:
High tone (H): á, é, ə́, í, ó, ɔ́, ú
Low tone (L): à, è, ə̀, ì, ò, ɔ̀, ù
Falling tone (HL): â, ê, ə̂, î, ô, ɔ̂, û
Rising tone (LH): ǎ, ě, ə̌, ǐ, ǒ, ɔ̌, ǔ
Unmarked vowels represent tonally neutral or contextually determined syllables. This explicit tone notation distinguishes the present dataset from the Common Voice Scripted Speech 25.0 – Ewondo resource, in which tone diacritics are systematically absent.
The dataset was compiled from scripted speech prompt lists read by native speakers of Ewondo in recording sessions held at the École Normale Supérieure de Yaoundé in 2026, in the framework of the Mozilla Data Collective project. Sentences were selected to provide broad phonological coverage of Ewondo and were transcribed in accordance with the AGLC orthographic standard, with full tone marking.
The dataset represents scripted speech in Ewondo, covering a broad range of everyday sentence types drawn from a general-purpose ASR/TTS prompt list. All utterances are scripted rather than spontaneous.
Total audio duration: 11,457 seconds (03:10:57), distributed across 1,781 MP3 audio clips in 19 recording sessions contributed by 16 native speakers of Ewondo. Total uncompressed dataset size: approximately [X] MB.
The dataset comprises:
1,781 MP3 audio clips read by 16 native speakers of Ewondo, with a total duration of 11,457 seconds (03:10:57), distributed across 19 recording sessions:
Session ewo_01: 97 clips (12m 03s)
Session ewo_02: 99 clips (14m 41s)
Session ewo_03: 99 clips (07m 52s)
Session ewo_04: 10 clips (01m 03s)
Session ewo_07: 99 clips (08m 51s)
Session ewo_08: 98 clips (08m 03s)
Session ewo_09: 96 clips (07m 09s)
Session ewo_13: 96 clips (10m 23s)
Session ewo_13-1: 97 clips (17m 29s)
Session ewo_14: 99 clips (12m 23s)
Session ewo_14-1: 99 clips (19m 09s)
Session ewo_15: 99 clips (08m 46s)
Session ewo_18: 100 clips (12m 04s)
Session ewo_18-1: 100 clips (10m 52s)
Session ewo_19: 96 clips (10m 21s)
Session ewo_22: 99 clips (05m 49s)
Session ewo_30: 99 clips (06m 55s)
Session ewo_31: 99 clips (08m 46s)
Session ewo_32: 100 clips (08m 08s)
Nineteen per-session sentence-to-audio mapping files (mapping.tsv), each with 4 columns.
#audio_filename: filename of the audio clip (MP3)
#key: unique hash identifier of the recording
#sentence: sentence text as read by the speaker, transcribed in AGLC orthography with tone marking
#attempts: number of recording attempts before acceptance
| audio file | sentence (Ewondo, AGLC) |
|---|---|
| 1dbc5504f402c312236c645b271511f2.mp3 | Aa bɔŋ !!! Dɔŋ ósúsúa nâ bitá biá bɔ, ndɔ wa yə̌m fə, wa kad na wa yəm bǎn minlaŋ itə mivɔg, hǹń ? |
| def9b64235e4e0d803d23de18a665b6a.mp3 | Mə̌men makad nə ma yəm, ma kad nə mayəm. |
| 66efb6963a001b47d7989d273726871d.mp3 | Abim ma sili wa |
| 96e1bf490f6869899cd23134432393f2.mp3 | Iyɔŋ wa síli ma ábím ma yəm mə kadə́ wa. |