License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesTask: TTS
Release Date: 4/16/2026
Format: MP3, TSV
Size: 319.05 MB
Share
This dataset comprises audio recordings of Yoruba speech aligned with textual transcriptions. The dataset is structured into 17 folders, each containing audio files and a corresponding audio-text mapping file. The audio clips are short, typically ranging from 3 to 27 seconds, and are suitable for training and evaluating Text-to-Speech (TTS) systems. The dataset follows a structured format where each audio file is paired with its corresponding transcription in a tab-separated mapping file. The textual content used in this dataset originates from a variety of written and spoken sources in Yoruba, including narrative texts, conversational exchanges, opinion and commentary content, and everyday speech samples. These texts were segmented into short utterances suitable for read speech and TTS modelling.
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseRestrictions/Special Constraints
- For research and scientific use only - You agree not to re-host or redistribute this dataset
Forbidden Usage
You agree not to use the data for: - Generative AI - Voice cloning or speaker imitation - Reproduction, duplication, modification, or redistribution - Commercial use without explicit permission
Intended Use
This dataset is intended for the training and evaluation of Text-to-Speech (TTS) systems for the Yoruba language. It aims to support: - Language technology development for one of Africa's most widely spoken indigenous languages - Development of speech technologies for under-served African language communities - Educational applications in multilingual and multilectal contexts - Research in low-resource and African language speech synthesis
Yoruba (native name: Èdè Yorùbá) is a Niger-Congo language of the Volta-Niger branch, spoken primarily in southwestern Nigeria and parts of Benin and Togo. It is one of the three major official languages of Nigeria alongside Igbo and Hausa. Yoruba is also historically represented across the African diaspora, particularly in Brazil, Cuba, Trinidad, and other parts of the Americas, where its influence has been preserved through religious and cultural traditions.
Yoruba is spoken by an estimated 40 to 50 million people as a first language, with many more speaking it as a second language. It is the native language of the Yoruba people, one of the largest ethnic groups in Africa, concentrated in the states of Lagos, Ogun, Oyo, Osun, Ondo, Ekiti, Kwara, and Kogi in Nigeria, as well as in parts of Benin (Republic of) and Togo. Yoruba enjoys a strong literary tradition, vibrant oral heritage, and is the medium of instruction in early education in Yoruba-speaking areas of Nigeria.
Yoruba is a tonal language, with three phonemic tones — high (´), mid (unmarked), and low (`) — that distinguish meaning at the lexical and grammatical levels. This tonal structure is a defining feature of the language and is systematically represented in the standard orthography.
Yoruba is a dialect continuum with a widely recognized Standard Yoruba variety used in education, media, and literature. Major regional dialect groups include:
Central (Oyo) dialects:
Oyo Yoruba — the basis of Standard Yoruba; spoken in Oyo State; forms the prestige variety used in formal writing, broadcasting, and education
Northwestern dialects:
Egba — spoken in Abeokuta area (Ogun State); notable for phonological differences in vowel harmony and consonant clusters
Ijebu — spoken in Ijebu Ode area (Ogun State); distinguished by specific lexical and phonological features
Eastern dialects:
Ekiti — spoken in Ekiti State; strongly divergent from Standard Yoruba in phonology, lexicon, and some grammatical structures
Ondo — spoken in Ondo State; intermediate between Ekiti and central varieties
Ijesa (Ilesa) — spoken in Osun State; exhibits unique tonal and vowel patterns
Northern dialects:
Igbomina — spoken in northern Kwara and Osun; considered conservative in retaining older Yoruba features
Okun — spoken in Kogi State; transitional dialect showing influence from neighboring languages
Urban variety:
Lagos Yoruba — urban variety spoken in Nigeria's commercial capital; cosmopolitan, incorporating borrowings from English, Hausa, Igbo, and Nigerian Pidgin; widely used in popular music, social media, and youth culture
The variety represented in this dataset reflects the standard spoken variety of Yoruba, drawing on the common spoken register widely intelligible across Yoruba-speaking communities.
Yoruba has two major written traditions: the modern standard orthography and the older traditional missionary orthography.
Modern standard Yoruba orthography was formalized through academic and governmental efforts and is used in contemporary official publications, school curricula, and most current printed Yoruba-language materials. It employs an augmented Latin alphabet of 25 letters:
a, b, d, e, ẹ, f, g, gb, h, i, j, k, l, m, n, o, ọ, p, r, s, ṣ, t, u, w, y
Key features of the modern standard orthography include sub-dotted vowels (ẹ, ọ) to distinguish open-mid from close-mid vowels, a sub-dotted consonant (ṣ) for the postalveolar fricative [ʃ], and systematic tonal diacritics (acute accent for high tone, grave accent for low tone, unmarked for mid tone) on vowels and syllabic nasals.
Traditional missionary Yoruba orthography, by contrast, was the earliest written form of Yoruba, developed in the 19th century primarily through the work of Samuel Ajayi Crowther — the first African Anglican bishop, who produced a Yoruba grammar (1843) and was the principal translator of the Yoruba Bible (1884). This orthography predates the standardization effort and uses a simpler Latin alphabet without subscript diacritics or systematic tone marking. Key features include:
No tone marks on vowels or syllabic nasals; tones are left to contextual and phonological inference
Plain e and o used for both close-mid (e, o) and open-mid (ẹ, ọ) vowels, without subscript dots
Plain s used for both the alveolar fricative [s] and the postalveolar fricative [ʃ] (where modern orthography distinguishes s and ṣ)
The digraph gb is retained for the labial-velar stop, consistent with both systems
Double vowels (e.g., ee, oo, ii) and repeated nasals (e.g., nn) are used in some positions
The transcriptions in this dataset are written in the traditional missionary Yoruba orthography, without tone marks or subscript-dotted letters. This reflects the written tradition of the source texts from which the speaker read, which were drawn from older Yoruba print materials in this orthographic tradition.
Yoruba is an isolating, Subject-Verb-Object (SVO) language with a rich system of tonal contrasts operating at both the lexical and grammatical levels. Key grammatical features include:
Tonal grammar:
Tone functions not only to distinguish lexical meanings (e.g., "igba" = 200 vs. "ìgbà" = time/season vs. "igbá" = calabash) but also marks grammatical categories such as tense, aspect, and focus
Tense-Aspect system:
"máa" or "ń" — marks habitual/continuous aspect (e.g., "Ó ń lọ" = "He/she is going")
"ti" — marks perfective aspect (e.g., "Mo ti lọ" = "I have gone")
"yóò" or "á" — marks future tense (e.g., "Ó yóò wá" = "He/she will come")
No inflectional verb morphology; verbs do not conjugate for person or number
Pronominal system:
Pronouns distinguish number but not grammatical gender: "ó" (he/she/it), "wọn" (they), "a/àwa" (we), "ẹ/ẹ̀yín" (you plural), "mo/èmi" (I)
Copula and focus:
"ni" functions as a copula and focus marker in cleft and identificational constructions
"jẹ" functions as a full copula in predicative constructions
Serial verb constructions: Multiple verbs are commonly chained in sequence without overt connectives, a typologically characteristic feature of Volta-Niger languages
Negation:
"kò" or "ò" precedes the verb for negation (e.g., "Ó kò lọ" = "He/she did not go")
The textual material in this dataset originates from a variety of Yoruba written and transcribed sources, including narrative texts, opinion and commentary-based content, social discourse material, and everyday speech samples. The texts include material drawn from Yoruba-language journalism and editorial writing, reflecting authentic written Yoruba. The texts were segmented into short utterances suitable for read speech and used as prompts for audio recording sessions.
This dataset is derived from prompted read speech. The speaker read aloud pre-written Yoruba texts drawn from narrative, conversational, and opinion-based sources. The content covers a range of registers and everyday topics typical of spoken Yoruba, including civic commentary, social observation, personal reflection, and general discourse.
The dataset has been structured as segmented, read-style speech suitable for speech synthesis tasks.
The dataset is composed of 17 folders containing audio clips and corresponding mapping files.
Each folder contains between 11 and 188 audio files. Individual audio clips typically range from 3 to 27 seconds in duration.
Folder-level durations range from approximately 2 minutes to over 40 minutes of audio.
The dataset represents a total of 1,942 audio files with a combined duration of approximately 5 hours 50 minutes and 39 seconds of segmented Yoruba speech.
A detailed breakdown of durations and file counts per folder is provided below.
| Folder | Files | Duration |
|---|---|---|
| tts_yoruba_dataset_01_175clips_2193s_20260408-1127 | 175 | 31m 17s |
| tts_Yoruba_02_dataset_175clips_1942s_20260409-2356 | 175 | 29m 11s |
| tts_Yoruba_03_dataset_175clips_1856s_20260411-0030 | 171 | 27m 32s |
| tts_Yoruba_dataset_04_175clips_1929s_20260411-1341 | 175 | 28m 40s |
| tts_Yoruba_dataset_05_175clips_2078s_20260412-1144 | 146 | 23m 55s |
| tts_Yoruba_dataset_06_175clips_1730s_20260412-1505 | 175 | 26m 00s |
| tts_Yoruba_dataset_07_25clips_245s_20260412-1549 | 25 | 3m 45s |
| tts_Yoruba_dataset_8_48clips_722s_20260413-1939 | 48 | 11m 07s |
| tts_Yoruba_dataset_9_49clips_606s_20260413-2314 | 49 | 9m 36s |
| tts_yoruba_dataset_10_81clips_861s_20260414-1511 | 81 | 13m 34s |
| tts_Yoruba_dataset_11_99clips_1278s_20260414-1800 | 99 | 18m 47s |
| tts_Yoruba_dataset_12_45clips_568s_20260414-1817 | 45 | 8m 28s |
| tts_Yoruba_dataset_13_b_77clips_1038s_20260415-1632 | 77 | 15m 49s |
| tts_Yoruba_dataset_14_151clips_1935s_20260415-1715 | 151 | 30m 06s |
| tts_Yoruba_dataset_14a_151clips_1935s_20260415-1716 | 151 | 30m 06s |
| tts_Yoruba_dataset_14b_188clips_2636s_20260415-2147 | 188 | 40m 35s |
| tts_Yoruba_dataset_15_11clips_127s_20260415-1652 | 11 | 2m 04s |
| GRAND TOTAL | 1,942 | 5h 50m 39s |
Each folder in the dataset contains:
A collection of audio files in MP3 format
A tab-separated mapping file linking each audio file to its transcription
Each line in the mapping file follows the format:
audio_filename.mp3 key sentence attempts
The dataset is designed for TTS pipelines requiring paired audio-text data.
e6503d0affe22118b3bc14a1aa244684.mp3 | Kini o fe so nipa ikowoje ti ko mo eya mo ni orile-ede yii?
9e477f34a0ba6f7196d62c6333af1161.mp3 | Bawo ni yoo se maa wo idagbasoke ton ba apo awon oludari apa ariwa Naijiria
4916b1a822addabf960de77e3e970c37.mp3 | Ojuse wa gege bi ara ilu ni lati maa wa igbe aye to daade fun ara wa laini ro bi awon oloseluse se ba wa tan.
936c2fbdd7112457e7cbc16f9486db15.mp3 | Adupe pupo lowo gbogbo awon ti won npe wa ni ti gidi lati fi erongba won han lori awon ohun ti a n ko ninu iwe Iroyin-Owuro.
48f3966dd24f7ca54d4b0cbc4282c8a3.mp3 | Ilu Eko tii n jo oun to dorikodo bayi.
f0bea3568f28ae9b6ec7c5a7ab5d1cf7.mp3 | Ojoojumo ni ona wa n baje si ni ipinle Eko, gbogbo re dake roro bi enipe ko si olori ni aganyin.
3779984145d73507bcd8c14b09fd7179.mp3 | Oun si lo ye gbogbo awa olugbe Afrika, awa omo Naijiria.
22df44c7cea8408f23e16e3e035b5032.mp3 | Igbe aye to dun, to layo, to si larinrin laisi ifoya dun pupo lati maa gbe.