License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmpedjr9y00wqnv07styloz0d
Task: MT
Release Date: 5/20/2026
Format: TSV
Size: 137.84 KB
Share
This dataset is a parallel corpus of Ewondo and French texts. The text was obtained by transcribing raw audio files recorded in Yaoundé in the 1980s. Transcriptions and French translations were produced between 2016 and 2017 by Hubert Fernand Nkoumou, the legal owner of this dataset. For the purpose of creating this dataset, Ewondo and French text alignment was performed and quality-checked. The corpus is suitable for machine translation and other natural language processing tasks on the Ewondo language.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
By downloading this dataset, you agree: - To use it for research and scientific use only - That you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: - Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the legal owner of the dataset.
Intended Use
This dataset is intended for the training or testing of machine translation models. Its purpose is to support the learning and revitalisation of the Ewondo language, and to contribute to the development of practical natural language processing tools and endogenous multilingual education resources in Cameroon.
Ewondo is a Narrow Bantu language indigenous to a population mainly located in the Centre Region of Cameroon, with pockets of settlements in the South and East Regions. Ewondo is vehicular to populations in the South and East Regions of Cameroon, and has also developed into a creole known as Mongo Ewondo.
The term 'Ewondo' is used to describe a set of linguistic varieties whose speakers may or may not identify with the term. This is partly due to the structures of linguistic governance. In Cameroon, a nationwide linguistic survey was undertaken in the second half of the 1970s and the first half of the 1980s as part of the Atlas Linguistique du Cameroun project. The survey resulted in the publication of the Administrative Atlas of Cameroonian Languages. In this work, a macro-language called Beti-Fang is identified, with Ewondo being one of the major micro-languages alongside Fang, Bulu, Ntumu and Eton. Other subgroups speaking varieties that differ to a greater or lesser extent have often been subsumed under one of the more prominent Beti-Fang micro-languages. Consequently, it is very difficult to determine with confidence, based on which variables, a particular linguistic variety can be categorised as Ewondo without distorting reality.
Latin-based orthography with optional tone marking. Vowel inventory: a, e, ə, i, o, u (long vowels by doubling)
Simple consonants: b, d, f, g, h, k, l, m, n, p, s, t, v, w, y, z Digraphs / Prenasalized: mb, nd, ng, nk, nz, ny, dz, ts Special symbols: ŋ, ə
Tone marking is optional and encoded using diacritics:
Grave accent (◌̀): low tone
Acute accent (◌́): high tone
Caron (◌̌): rising tone
Circumflex (◌̂): falling tone
The text used in this dataset was transcribed from interviews conducted in Yaoundé in the 1980s by Professor Kum A. Ndumbe and his research team. The recordings documented personal histories of German colonisation. The transcriptions and French translations were produced between 2016 and 2017 by Hubert Fernand Nkoumou. Ewondo–French alignment was performed in the process of creating this dataset.
This dataset is a transcription of prompted speech in the form of a directed interview. The aim of the interview was to elicit personal stories about the German colonial experience in Cameroon. Similar interviews were conducted in many other languages and locations across Cameroon.
388 KB
This parallel corpus comprises 3,631 lines, each consisting of a translation unit in both the source and target languages. The Ewondo source text has 28,232 tokens, while the French target text has 36,000 tokens.
| Metric | Count |
|---|---|
| File size | 388 KB |
| Lines | 3,631 |
| Ewondo tokens | 28,232 |
| French tokens | 36,000 |
1911 aa ? | En 1911 ?
Hǹǹń !!! 1911. | Oui. 1911.
Aa bɔŋ !!! Dɔŋ ósúsúa nâ bitá biá bɔ, ndɔ wa yə̌m fə, wa kad na wa yəm bǎn minlaŋ itə mivɔg, hǹń ? | Ah bon !!! Donc avant que la guerre ait lieu, et tu connais aussi, tu dis que tu connais quelques histoires de cette époque là, n'est-ce pas ?
Mə̌men makad nə ma yəm, ma kad nə mayəm. | Moi meme je dis que je connais, je dis que je connais.
Abim ma sili wa | Ce que je vais te demander ?