Sample Dagbani-TTS-Dataset

Description

This dataset comprises 2,488 high-quality audio recordings of read speech produced by a single Dagbani speaker over 16 sessions. Dagbani (ISO 639-3: dag), also known as Dagbane or Dagomba, is a Gur language of the Niger-Congo family spoken primarily in the Northern Region of Ghana, particularly in the Dagbon traditional area. It is the most widely spoken language in northern Ghana and serves as a lingua franca across the region. Despite being spoken by an estimated 3 to 4 million people, Dagbani remains severely under-resourced in terms of digital speech data, making this dataset a significant contribution to natural language processing efforts for the language. Audio files are provided in MP3 format (approx. 185 MB), totalling 2 hours, 50 minutes and 59 seconds of speech. The dataset includes 16 audio/sentence mapping files in TSV format, containing 2,488 aligned audio/sentence pairs in total. Transcriptions follow the standard Dagbani orthography as developed by the Dagbani Orthography Committee and used in published Dagbani materials. The recordings draw on a range of textual material in Dagbani, offering varied prosodic and lexical diversity for training and evaluating TTS and ASR models. The dataset is intended for research and scientific use in speech technology for Dagbani.

Language

Dagbani (ISO 639-3: dag), also known as Dagbane or Dagomba, is a Gur language of the Niger-Congo family (specifically the Oti-Volta branch) spoken primarily in the Northern Region of Ghana, in the traditional Dagbon kingdom centred on the town of Yendi. It is the most widely spoken language of northern Ghana, with an estimated 3 to 4 million speakers, and functions as a regional lingua franca across the north of the country. Dagbani is typologically characterised by agglutinative morphology, subject-verb-object (SVO) word order, and a system of noun classes. The language has a rich tonal system in which tone is lexically and grammatically distinctive. Dagbani also exhibits nasal vowels, prenasalised consonants, and labialised and palatalised consonant clusters.

Variants

The recordings in this dataset were produced by a single speaker of Dagbani from Ghana. The variety recorded is representative of the standard written and broadcast norm of Dagbani as used in the Dagbon area. No sub-variety distinctions are encoded in the dataset.

Alphabet

The orthography used in the transcription of the audio recordings follows the standard Dagbani writing system as developed and standardised by the Dagbani Orthography Committee, in use for Dagbani publications, literacy materials, and broadcast media in Ghana. The orthography is based on the Latin alphabet and employs a set of conventions to represent sounds specific to Dagbani. Tonal distinctions, while present in the spoken language, are not systematically marked in the standard orthography. Dagbani makes use of several digraphs (e.g., gb, kp, ny, ng) to represent labial-velar stops and palatal/velar nasals. Additional special characters include ɛ (open mid-front vowel), ɔ (open mid-back vowel), and ŋ (velar nasal), which are standard in the orthography. Vowel nasalisation is indicated by a following n in certain environments. The result is an orthographic system that is widely used in Dagbani literacy and publishing, making the transcriptions in this dataset consistent with established written norms.

Source

This dataset was compiled as part of a Text-to-Speech data collection initiative for Dagbani. The textual source material consists of a range of texts in Dagbani, including educational, narrative, and informational content. The speaker read prepared passages drawn from these texts, providing naturalistic and culturally grounded speech data. Recordings were made across 16 sessions and subsequently curated, deduplicated, and aligned with their corresponding transcriptions.

Domain

The dataset consists of prompted read speech in Dagbani. The textual source material derives from a range of registers, including educational, narrative, and informational texts — representative of standard written Dagbani. The recordings offer good prosodic and lexical diversity — including varied sentence lengths, clause structures, and registers — making them suitable for TTS model training and ASR evaluation.

Size

Total size of MP3 audio: approx. 185 MB Total size of TSV mapping files: approx. 300 KB

Structure

This dataset comprises audio clips and audio/text mapping files organised across 16 recording sessions. There are 2,488 audio clips in MP3 format, totalling 2 hours, 50 minutes and 59 seconds of speech. The dataset includes 16 audio/text mapping files (mapping.tsv), each containing aligned audio/sentence pairs for the corresponding session, with 2,488 aligned pairs in total. Each TSV file contains the following fields: audio_filename, key, sentence, attempts.

Sample

Audio filename	Sentence
333399be5fc74b7c408d9d70e14b6048.mp3	ZAŊ TI PIRAMƐRI SHIKURITI kpa yila maa gbunni mini haŋkali shɛli di ni mali.
e8d89515d173d19c371cc7f3608a6dab.mp3	Bɔ ka nachimba maa lee niŋda?
cb818087e0730aa70d2c8b7a3bc44630.mp3	Yi tɛhiya ni bɛ suhu gbaai bɛ ni niŋdi shɛli maa?
78e82ad98c5081053891accc6f970a77.mp3	Yilimiya yila maa ka buɣisima.
7b105d736141d508b7e9614649e5b299.mp3	N-yili bia yolibu yila/biyoliyila ni kumsi din lu n-zahim ka zamzam zaŋ kpa bia yolibu yila/biyoliyila daan- faannima ni nyɛ
0ac17072189a293ba7f91492943c427d.mp3	shɛŋa zaŋ n-ti bilɛɣu
e5a981e04602f749df51890fdf880567.mp3	bia ma ni biyola.
afbf7b31118d2cf95432ef1c9b62ae02.mp3	A ma chaŋla Tuunaayili sima piɛbu
751e9ab8b141c600501da74dbf827917.mp3	N-sabi lahibali shɛli bɛ ni karim kolivaai ni nyɛ shɛli.
78fd6809ce3e02de7573fe3a33933aff.mp3	Gbana sabbu mini talifoon nyɛla lahibali wuligibu soya.