License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesDataset ID:
cmpe4sp6100rznu07p61gmppd
Task: NLP
Release Date: 5/20/2026
Format: MP3, TSV
Size: 7.86 MB
Share
Diboum_ALCAM-MultimodalDataset is a richly curated, multimodal linguistic dataset dedicated to the documentation and technological enhancement of the Diboum variety of Basaa (ISO 639-3: bas), a Bantu language of Cameroon. Diboum is a localised and socially embedded speech form that is rarely represented in standard grammatical descriptions or lexicographical resources. The dataset comprises three closely aligned components: (i) a structured datasheet containing carefully selected example sentences reflecting casual, albeit non-authentic, usage in the Diboum variety; (ii) high-quality audio recordings of these sentences, produced by a native speaker; and (iii) an explicit audio–sentence mapping file enabling precise alignment between the textual and acoustic data. The dataset's primary added value lies in its explicit focus on the Diboum variety of Basaa. Diboum is classified as a dialect of Basaa (bas) both by the Ethnologue (https://www.ethnologue.com/language/bas/) and in the standard reference atlases of Cameroon's languages: the Atlas Linguistique du Cameroun by Breton and Bikia Fohtung (1991) and the Atlas Linguistique de l'Afrique Centrale: le Cameroun by Bibam Bikoi (2012). Like many other geographically and socially situated varieties of Basaa, Diboum typically remains invisible in reference grammars, dictionaries and educational materials that often privilege more standardised or better-documented forms of the language. The dataset captures micro-variation in phonetics, phonology, morphosyntax and lexical choice that are essential for understanding socially situated linguistic practices rather than a homogeneous, abstract system. In this sense, the dataset contributes to a more inclusive representation of linguistic diversity within the Basaa speech community. From a methodological perspective, the dataset is designed to bridge the gap between language documentation and language technology. The parallel availability of text in the Diboum variety and in French, alongside aligned speech, makes the dataset suitable for a wide range of applications, including automatic speech recognition (ASR), text-to-speech (TTS), machine translation (MT), forced alignment, pronunciation modelling and multimodal language learning tools. At the same time, the structured datasheet supports linguistic analysis, contrastive studies with other language varieties and pedagogical uses in teacher training and language revitalisation contexts. More broadly, the Diboum_ALCAM-MultimodalDataset exemplifies an approach to African language resources that highlights fluidity, longitudinal variation, orality and community-based practice.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
By downloading this dataset, you agree: - To use it for research and scientific use only - that you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: determining the identity of the speaker in the dataset; attempt to clone the voice or train models that imitate the speaker in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the legal owner of the dataset.
Intended Use
(a) Speech-related tasks: - Automatic speech recognition (ASR): Audio–text alignment allows the evaluation of speech recognition models for Diboum Basaa. However, it should be noted that the read sentences are transcribed phonetically using the IPA. There is an existing orthographic standard for Basaa — the General Alphabet of Cameroon's Languages — which is closer to phonetic transcription than traditional missionary orthographies. - Text-to-speech (TTS): As the dataset contains clean sentence–audio pairs, it can also be used to evaluate speech synthesis or text-to-speech models. Here again, it should be noted that the alphabet used to write the sentences is the IPA and not the General Alphabet of Cameroon's Languages or any other conventional orthography for Basaa. - Speech–text alignment/forced alignment benchmarking: Fine-grained, word-level segmentation provides ideal ground truth for evaluating phoneme- or word-level aligners. (b) Translation and multilingual tasks: - Machine translation (Diboum Basaa ↔ French): The sentence-level alignment between Diboum/Basaa and French makes it a parallel corpus for evaluating translation models, with the caveat of the phonetic orthographic standard employed. - Speech translation (speech-to-text) (c) Linguistic and lexicographic tasks: - Morphological analysis/glossed corpus studies: The morpheme-level glosses and noun class data are valuable for computational morphology, interlinear text modelling (ILTs) and grammar induction tasks for Basaa and related Bantu languages. - Lexicon and part-of-speech tagging: These are useful for building linguistic resources such as dictionaries, morphological analysers or taggers for Diboum Basaa.
Diboum is classified as a dialect of Basaa (ISO 639-3: bas) by the Ethnologue (https://www.ethnologue.com/language/bas/) and in the two standard atlases of Cameroon's languages: the Atlas Linguistique du Cameroun (Breton and Bikia Fohtung 1991) and the Atlas Linguistique de l'Afrique Centrale: le Cameroun (Bibam Bikoi 2012). Basaa belongs to the Bantu branch of the Niger-Congo language family (Guthrie zone A.43). Basaa speakers are located primarily in the Littoral Region of Cameroon, in the Nkam and Nyong-et-Kéllé Divisions, as well as in the Centre Region. The Diboum variety is spoken in the Nkam Division in the Littoral Region.
At the time of publication of this dataset, we do not have a precise idea of the full scope of variation of Diboum, a variety which is itself considered a component of the Basaa dialect continuum. The relationship between Diboum and other attested varieties of Basaa (e.g. Mbaa, Ndog-bikim, Hijuk) has not been systematically characterised in the available literature.
The writing system used for the transcription of Diboum in this dataset is the International Phonetic Alphabet (IPA), as reflected in lexical entries (Word) and sentence-level examples (LangEx) in the datasheet. The phonological inventory described below is derived directly from the attested forms in the LangEx and Word columns of the datasheet.
The vowel system attested in the dataset is as follows:
i, e, ɛ, a, ɔ, o, u, ə
The consonant inventory reflected in the dataset includes the following simple, prenasalised, labialised and other consonants:
b, ɓ, by, c, d, dz, f, g, h, k, kw, l, m, mb, mv, n, nd, ng, ŋ, ŋg, p, r, s, t, v, w, y, z, ɲ, ɟ
These consonants appear consistently across noun stems, verbal forms, derivational patterns and noun-class alternations (e.g. ɲɔ̀ 'mouth', dìs 'eye', kíŋ 'head', hù 'ear', bì-kíŋ 'heads', mà-hù 'ears').
The datasheet shows lexical and grammatical contrastive tones, marked directly on vowels and on sonorant consonants m and n. The following tonal categories are attested in the LangEx column:
High tone (H): á, é, ɛ́, í, ó, ɔ́, ú, ə́, ń, ḿ
Low tone (L): à, è, ɛ̀, ì, ò, ɔ̀, ù, ə̀, ǹ, m̀
Falling contour tone (HL): â, ê, î, ô, ɔ̂, û
Rising contour tone (LH): ǎ, ě, ǐ, ǒ, ǔ
Mid/level tone: attested on a restricted set of items, marked with macron (e.g. lāyū, mā-bē)
Unmarked vowels represent tonally neutral or contextually determined syllables.
The data reflects an active noun class system typical of Bantu languages, with prefixes marking singular/plural alternations (e.g. dì-sòŋ / mà-sòŋ 'ear of maize / ears of maize'; bì-kíŋ / kíŋ 'heads / head'; è-lím / bì-lím 'tongue / tongues'). The class prefixes attested in the dataset include: à-, bà-, bì-, bí-, by-, dì-, è-, mà-, mì-, m̀-, among others.
The dataset was collected through a questionnaire designed to gather basic information about the Diboum lexicon and grammar. This was done as part of the Atlas Linguistique du Cameroun (ALCAM) project.
The dataset represents a linguistic questionnaire designed to elicit the basic lexicon and grammatical information.
Total size is approximately 8.6 MB (uncompressed), comprising 8.52 MB of MP3 audio files and approximately 100 KB of TSV data files.
The total duration of the 337 audio recordings is 1068.9 seconds (17 minutes 49 seconds).
The dataset comprises: 1) a datasheet (Diboum-ALCAM-MultimodalDataset.tsv) with 375 lines and 20 columns; 2) 337 voice clips read by a single native speaker, stored as MP3 files in the tts_dataset subfolder; 3) a sentence-to-audio mapping file (mapping.tsv) with 337 lines and 4 columns.
#OrigID: original number of lexical entry on paper questionnaire
#EditID: modification of #OrigID
#FrenchRef: reference entry (originally provided in French)
#FrenchComm: original comments about reference entry (#FrenchRef)
#French: lexical entry in French (overlaps with #FrenchRef)
#Note: note of researcher on the lexical entry
#POS: part of speech
#Class: noun class (where applicable)
#Morf: morphological attribute (e.g. plural, singular)
#Var: (na)
#Word: lexical entry in Diboum
#CrossRef: cross-referencing of lexical entry number
#FrenchEx: example sentence in French
#LangEx: example sentence in Diboum
#LangExEdit: manual editing of #LangEx
#FrenchExEdit: edited French equivalent of #FrenchEx
#LangPars: word-for-word parsing in Diboum
#LangParsEdit: editing of #LangPars
#FrenchPars: French equivalent of #LangParsEdit
#FrenchParsEdit: editing of #FrenchPars
#audio_filename: name of the MP3 audio file
#key: MD5-based identifier shared with the audio filename (without extension)
#sentence: lexical item and/or example sentence read by the speaker
#attempts: number of recording attempts before the selected take
| audio file | words & sentences |
|---|---|
| 53fa2dccd478e3d6519d580acc8e76e7.mp3 | ɲɔ̀ ; à bí ɲɔ̀ sà |
| 91ceb854c1c06bea68557601cbe1c4fe.mp3 | mì-ɲɔ̀ |
| 72b1abf8f3504dde028abe3c312e6762.mp3 | dìs ; bá béká báà tò mìs |
| 1a140de194c2168aa188bb12b392227e.mp3 | mìs |
| 3975c30a65b70ff6fde1563a399585a8.mp3 | m̀-ró ; à bí m-ró keŋ dì kíŋ rà |
| 2badc3166db8b1509da93d2ec05f0750.mp3 | hù ; mà-hù mé má ná lāyū |
| 3c164ed5ffb0fac8eba9074e56b09da1.mp3 | dì-sòŋ ; mà-sòŋ má m-byó máā |
| 7ccd896bbcd26882efbdd66ddf24b69b.mp3 | è-lím ; à kòk(ò) lá è-lím |
| 40d7509cb8bf58f395dbd6e12f327751.mp3 | mà-sòŋ |
| 02240d7b41a7a175c20a0f854fbf4deb.mp3 | mà-hù |