Institute of African Digital Humanities

About us

Institute of African Digital Humanities (IADH)

Website: inhunumaf.hypotheses.org

About

The Institute of African Digital Humanities (IADH) — founded in 2021 as the Institut des Humanités Numériques d'Afrique francophone — is a research and practice network applying digital and AI methodologies to humanities research with a focus on Africa, providing a collaboration platform for Digital Humanities practitioners on the continent. Since late 2024, IADH has focused on designing and publishing NLP and machine learning datasets for African languages, with priority on low-resourced and underrepresented ones.

Achievements

BOUQuET Translation Dataset (Mozilla Foundation)

IADH produced gold-standard human translations of 1,364 sentences across 324 paragraphs into five Cameroonian languages — Basaa, Eton, Duala, Bafia, and Tupuri — for the Benchmark and Open initiative for Universal Quality Evaluation in Translation (BOUQuET), an international AI translation benchmark developed with the Mozilla Foundation. All translations were produced exclusively by human specialists, with no use of machine translation or generative AI.

46 Languages on Mozilla Common Voice

IADH launched 46 under-served African languages on Mozilla Common Voice, including Adamawa Fulfulde, Bafia, Bafut, Bakoko, Bamun, Bamvele, Bankon, Baoulé, Batanga, Borgu Fulfulde, Bulu, Cameroon Pidgin English, Dagbani, Duala, Ebrie, Eton, Ewondo, Fang, Fe'fe', Gbaya, Ghomala, Gidar, Giziga, Ibibio, Kom, Mada, Masana, Mbum, Mbo, Medumba, Mokpwe, Mpiemo, Mpumpong, Mundang, Mungaka, Musey, Musgu, Ngiemboon, Ngomba, Ngombale, Ouldeme, Tuki, Tunen, Tupuri, Yangben — the majority with no prior digital speech presence.

32 Datasets on the Mozilla Data Collective

IADH has published 32 original datasets on the Mozilla Data Collective, all under NOODL-1.0, covering 20+ African languages from Cameroon, Congo, Nigeria, and West Africa.

Speech: Bati ASR, Beembe TTS, Bomitaba TTS, Bulu TTS, Bamun TTS, Ewondo TTS, Hausa TTS, Kituba TTS, Laari ASR, Lingala TTS, Mbosi TTS, Naija TTS, Suundi TTS, Teke-Laali TTS, Yaka TTS, Yoruba TTS.

ALCAM multimodal (IPA + audio + French): Akoose, Basaa, Bulu, Ewondo-Yanda, Ewondo-Fong, Ewondo-Mbida-Mbani, Mvele, Yezoum.

Parallel corpora: Adamawa Fulfulde–French, Bamun–French 1.1, Bamun–French 2.0, Ewondo–French, Mada–French.

Text corpora: FUB Narratives, Mada Narratives, Spoken Congolese French.

Datasets

36 Datasets

Adamawa Fulfulde-French Parallel Corpus of Narratives 1.2NOODL-1.0fubMTTSV112.17 KB
Adamawa Fulfulde-TTS-DatasetNOODL-1.0fubTTSMP3, TSV169.27 MB
Akoose-ALCAM-MultimodalDatasetNOODL-1.0bssNLPMP3, TSV16.05 MB
Bamun-French Parallel Corpus 1.1NOODL-1.0baxMTTSV99.78 KB
Bamun-French Parallel Corpus 2.0NOODL-1.0baxMTTSV184.29 KB
Bamun-TTS-DatasetNOODL-1.0baxTTSMP3, TSV219.97 MB
Basaa-ALCAM-MultimodalDatasetNOODL-1.0basNLPMP3, TSV14.66 MB
Bati-MultiDialectalASR-DatasetNOODL-1.0btcASRWAV, TSV3.27 GB
Beembe-TTS-DatasetNOODL-1.0beqTTSWAV, TSV861.46 MB
Bomitaba-TTS-DatasetNOODL-1.0zmxTTSWAV, TSV1.00 GB
Bulu-TTS-Dataset 1.0NOODL-1.0bumTTSMP3, TSV87.40 MB
Bulu_ALCAM-MultimodalDatasetNOODL-1.0bumNLPMP3, TSV31.28 MB
Efik-TTS-DatasetNOODL-1.0efiTTSMP3, TSV297.28 MB
Ewondo-TTS-DatasetNOODL-1.0ewoTTSMP3, TSV152.70 MB
Ewondo-Yanda-ALCAM-MultimodalDatasetNOODL-1.0ewoNLPMP3, TSV18.09 MB
Ewondo_Fong_ALCAM-MultimodalDatasetNOODL-1.0ewoNLPMP3, TSV16.80 MB
Ewondo_Mbida-Mbani_ALCAM-MultimodalDatasetNOODL-1.0ewoNLPMP3, TSV19.25 MB
FUB-NarrativesNOODL-1.0fubNLPTXT168.34 KB
Hausa-TTS-DatasetNOODL-1.0hauTTSMP3, TSV276.90 MB
Igbo-TTS-DatasetNOODL-1.0iboTTSMP3, TSV172.77 MB
isiXhosa-TTS-DatasetNOODL-1.0xhoTTSMP3, TSV276.02 MB
Kituba-TTS-DatasetNOODL-1.0mkwTTSWAV, TSV553.28 MB
Laari-TTS-DatasetNOODL-1.0ldiASRWAV, TRJS, TSV568.26 MB
Lingala-TTS-DatasetNOODL-1.0linTTSWAV, TSV962.04 MB
Mada NarrativesNOODL-1.0mxuNLPTXT65.04 KB
Mada-French Parallel Corpus 1.0NOODL-1.0mxuTTSTSV122.37 KB
Mbosi-TTS-DatasetNOODL-1.0mdwTTSWAV, TSV644.39 MB
Mvele_ALCAM-MultimodalDatasetNOODL-1.0ewoNLPMP3, TSV14.13 MB
Naija-TTS-DatasetNOODL-1.0pcmTTSMP3, TSV324.82 MB
Spoken-Congolese-French-DatasetNOODL-1.0fr-CGNLPMP3, WAV, TSV3.44 GB
Suundi-TTS-DatasetNOODL-1.0sdjTTSWAV, TSV240.50 MB
Teke-Laali-TTS-DatasetNOODL-1.0lliTTSWAV, TSV635.61 MB
Tiv-TTS-DatasetNOODL-1.0tivTTSMP3, TSV311.58 MB
Yaka-TTS-DatasetNOODL-1.0iyxTTSWAV, TSV1.26 GB
Yezoum_ALCAM-MultimodalDatasetNOODL-1.0ewoNLPMP3, TSV12.81 MB
Yoruba-TTS-DatasetNOODL-1.0yorTTSMP3, TSV319.05 MB