Ewondo-French Parallel Corpus

Language

Ewondo is a Narrow Bantu language indigenous to a population mainly located in the Centre Region of Cameroon, with pockets of settlements in the South and East Regions. Ewondo is vehicular to populations in the South and East Regions of Cameroon, and has also developed into a creole known as Mongo Ewondo.

Variants

The term 'Ewondo' is used to describe a set of linguistic varieties whose speakers may or may not identify with the term. This is partly due to the structures of linguistic governance. In Cameroon, a nationwide linguistic survey was undertaken in the second half of the 1970s and the first half of the 1980s as part of the Atlas Linguistique du Cameroun project. The survey resulted in the publication of the Administrative Atlas of Cameroonian Languages. In this work, a macro-language called Beti-Fang is identified, with Ewondo being one of the major micro-languages alongside Fang, Bulu, Ntumu and Eton. Other subgroups speaking varieties that differ to a greater or lesser extent have often been subsumed under one of the more prominent Beti-Fang micro-languages. Consequently, it is very difficult to determine with confidence, based on which variables, a particular linguistic variety can be categorised as Ewondo without distorting reality.

Writing System

1. Vowels

Latin-based orthography with optional tone marking. Vowel inventory: a, e, ə, i, o, u (long vowels by doubling)

2. Consonants

Simple consonants: b, d, f, g, h, k, l, m, n, p, s, t, v, w, y, z Digraphs / Prenasalized: mb, nd, ng, nk, nz, ny, dz, ts Special symbols: ŋ, ə

3. Tone system

Tone marking is optional and encoded using diacritics:

Grave accent (◌̀): low tone
Acute accent (◌́): high tone
Caron (◌̌): rising tone
Circumflex (◌̂): falling tone

Source

The text used in this dataset was transcribed from interviews conducted in Yaoundé in the 1980s by Professor Kum A. Ndumbe and his research team. The recordings documented personal histories of German colonisation. The transcriptions and French translations were produced between 2016 and 2017 by Hubert Fernand Nkoumou. Ewondo–French alignment was performed in the process of creating this dataset.

Domain

This dataset is a transcription of prompted speech in the form of a directed interview. The aim of the interview was to elicit personal stories about the German colonial experience in Cameroon. Similar interviews were conducted in many other languages and locations across Cameroon.

Size

388 KB

Structure

This parallel corpus comprises 3,631 lines, each consisting of a translation unit in both the source and target languages. The Ewondo source text has 28,232 tokens, while the French target text has 36,000 tokens.

Metric	Count
File size	388 KB
Lines	3,631
Ewondo tokens	28,232
French tokens	36,000

Sample

1911 aa ? | En 1911 ?
Hǹǹń !!! 1911. | Oui. 1911.
Aa bɔŋ !!! Dɔŋ ósúsúa nâ bitá biá bɔ, ndɔ wa yə̌m fə, wa kad na wa yəm bǎn minlaŋ itə mivɔg, hǹń ? | Ah bon !!! Donc avant que la guerre ait lieu, et tu connais aussi, tu dis que tu connais quelques histoires de cette époque là, n'est-ce pas ?
Mə̌men makad nə ma yəm, ma kad nə mayəm. | Moi meme je dis que je connais, je dis que je connais.
Abim ma sili wa | Ce que je vais te demander ?

Description

Specifics

Considerations

Processes

Metadata