Yoruba-TTS-Dataset

Language

Yoruba (native name: Èdè Yorùbá) is a Niger-Congo language of the Volta-Niger branch, spoken primarily in southwestern Nigeria and parts of Benin and Togo. It is one of the three major official languages of Nigeria alongside Igbo and Hausa. Yoruba is also historically represented across the African diaspora, particularly in Brazil, Cuba, Trinidad, and other parts of the Americas, where its influence has been preserved through religious and cultural traditions.

Yoruba is spoken by an estimated 40 to 50 million people as a first language, with many more speaking it as a second language. It is the native language of the Yoruba people, one of the largest ethnic groups in Africa, concentrated in the states of Lagos, Ogun, Oyo, Osun, Ondo, Ekiti, Kwara, and Kogi in Nigeria, as well as in parts of Benin (Republic of) and Togo. Yoruba enjoys a strong literary tradition, vibrant oral heritage, and is the medium of instruction in early education in Yoruba-speaking areas of Nigeria.

Yoruba is a tonal language, with three phonemic tones — high (´), mid (unmarked), and low (`) — that distinguish meaning at the lexical and grammatical levels. This tonal structure is a defining feature of the language and is systematically represented in the standard orthography.

Variants

Yoruba is a dialect continuum with a widely recognized Standard Yoruba variety used in education, media, and literature. Major regional dialect groups include:

Central (Oyo) dialects:

Oyo Yoruba — the basis of Standard Yoruba; spoken in Oyo State; forms the prestige variety used in formal writing, broadcasting, and education

Northwestern dialects:

Egba — spoken in Abeokuta area (Ogun State); notable for phonological differences in vowel harmony and consonant clusters
Ijebu — spoken in Ijebu Ode area (Ogun State); distinguished by specific lexical and phonological features

Eastern dialects:

Ekiti — spoken in Ekiti State; strongly divergent from Standard Yoruba in phonology, lexicon, and some grammatical structures
Ondo — spoken in Ondo State; intermediate between Ekiti and central varieties
Ijesa (Ilesa) — spoken in Osun State; exhibits unique tonal and vowel patterns

Northern dialects:

Igbomina — spoken in northern Kwara and Osun; considered conservative in retaining older Yoruba features
Okun — spoken in Kogi State; transitional dialect showing influence from neighboring languages

Urban variety:

Lagos Yoruba — urban variety spoken in Nigeria's commercial capital; cosmopolitan, incorporating borrowings from English, Hausa, Igbo, and Nigerian Pidgin; widely used in popular music, social media, and youth culture

The variety represented in this dataset reflects the standard spoken variety of Yoruba, drawing on the common spoken register widely intelligible across Yoruba-speaking communities.

Writing System

Yoruba has two major written traditions: the modern standard orthography and the older traditional missionary orthography.

Modern standard Yoruba orthography was formalized through academic and governmental efforts and is used in contemporary official publications, school curricula, and most current printed Yoruba-language materials. It employs an augmented Latin alphabet of 25 letters:

a, b, d, e, ẹ, f, g, gb, h, i, j, k, l, m, n, o, ọ, p, r, s, ṣ, t, u, w, y

Key features of the modern standard orthography include sub-dotted vowels (ẹ, ọ) to distinguish open-mid from close-mid vowels, a sub-dotted consonant (ṣ) for the postalveolar fricative [ʃ], and systematic tonal diacritics (acute accent for high tone, grave accent for low tone, unmarked for mid tone) on vowels and syllabic nasals.

Traditional missionary Yoruba orthography, by contrast, was the earliest written form of Yoruba, developed in the 19th century primarily through the work of Samuel Ajayi Crowther — the first African Anglican bishop, who produced a Yoruba grammar (1843) and was the principal translator of the Yoruba Bible (1884). This orthography predates the standardization effort and uses a simpler Latin alphabet without subscript diacritics or systematic tone marking. Key features include:

No tone marks on vowels or syllabic nasals; tones are left to contextual and phonological inference
Plain e and o used for both close-mid (e, o) and open-mid (ẹ, ọ) vowels, without subscript dots
Plain s used for both the alveolar fricative [s] and the postalveolar fricative [ʃ] (where modern orthography distinguishes s and ṣ)
The digraph gb is retained for the labial-velar stop, consistent with both systems
Double vowels (e.g., ee, oo, ii) and repeated nasals (e.g., nn) are used in some positions

The transcriptions in this dataset are written in the traditional missionary Yoruba orthography, without tone marks or subscript-dotted letters. This reflects the written tradition of the source texts from which the speaker read, which were drawn from older Yoruba print materials in this orthographic tradition.

Grammar and Linguistic Features

Yoruba is an isolating, Subject-Verb-Object (SVO) language with a rich system of tonal contrasts operating at both the lexical and grammatical levels. Key grammatical features include:

Tonal grammar:

Tone functions not only to distinguish lexical meanings (e.g., "igba" = 200 vs. "ìgbà" = time/season vs. "igbá" = calabash) but also marks grammatical categories such as tense, aspect, and focus

Tense-Aspect system:

"máa" or "ń" — marks habitual/continuous aspect (e.g., "Ó ń lọ" = "He/she is going")
"ti" — marks perfective aspect (e.g., "Mo ti lọ" = "I have gone")
"yóò" or "á" — marks future tense (e.g., "Ó yóò wá" = "He/she will come")
No inflectional verb morphology; verbs do not conjugate for person or number

Pronominal system:

Pronouns distinguish number but not grammatical gender: "ó" (he/she/it), "wọn" (they), "a/àwa" (we), "ẹ/ẹ̀yín" (you plural), "mo/èmi" (I)

Copula and focus:

"ni" functions as a copula and focus marker in cleft and identificational constructions
"jẹ" functions as a full copula in predicative constructions

Serial verb constructions: Multiple verbs are commonly chained in sequence without overt connectives, a typologically characteristic feature of Volta-Niger languages

Negation:

"kò" or "ò" precedes the verb for negation (e.g., "Ó kò lọ" = "He/she did not go")

Source

The textual material in this dataset originates from a variety of Yoruba written and transcribed sources, including narrative texts, opinion and commentary-based content, social discourse material, and everyday speech samples. The texts include material drawn from Yoruba-language journalism and editorial writing, reflecting authentic written Yoruba. The texts were segmented into short utterances suitable for read speech and used as prompts for audio recording sessions.

Domain

This dataset is derived from prompted read speech. The speaker read aloud pre-written Yoruba texts drawn from narrative, conversational, and opinion-based sources. The content covers a range of registers and everyday topics typical of spoken Yoruba, including civic commentary, social observation, personal reflection, and general discourse.

The dataset has been structured as segmented, read-style speech suitable for speech synthesis tasks.

Size

The dataset is composed of 17 folders containing audio clips and corresponding mapping files.

Each folder contains between 11 and 188 audio files. Individual audio clips typically range from 3 to 27 seconds in duration.

Folder-level durations range from approximately 2 minutes to over 40 minutes of audio.

The dataset represents a total of 1,942 audio files with a combined duration of approximately 5 hours 50 minutes and 39 seconds of segmented Yoruba speech.

A detailed breakdown of durations and file counts per folder is provided below.

Folder	Files	Duration
tts_yoruba_dataset_01_175clips_2193s_20260408-1127	175	31m 17s
tts_Yoruba_02_dataset_175clips_1942s_20260409-2356	175	29m 11s
tts_Yoruba_03_dataset_175clips_1856s_20260411-0030	171	27m 32s
tts_Yoruba_dataset_04_175clips_1929s_20260411-1341	175	28m 40s
tts_Yoruba_dataset_05_175clips_2078s_20260412-1144	146	23m 55s
tts_Yoruba_dataset_06_175clips_1730s_20260412-1505	175	26m 00s
tts_Yoruba_dataset_07_25clips_245s_20260412-1549	25	3m 45s
tts_Yoruba_dataset_8_48clips_722s_20260413-1939	48	11m 07s
tts_Yoruba_dataset_9_49clips_606s_20260413-2314	49	9m 36s
tts_yoruba_dataset_10_81clips_861s_20260414-1511	81	13m 34s
tts_Yoruba_dataset_11_99clips_1278s_20260414-1800	99	18m 47s
tts_Yoruba_dataset_12_45clips_568s_20260414-1817	45	8m 28s
tts_Yoruba_dataset_13_b_77clips_1038s_20260415-1632	77	15m 49s
tts_Yoruba_dataset_14_151clips_1935s_20260415-1715	151	30m 06s
tts_Yoruba_dataset_14a_151clips_1935s_20260415-1716	151	30m 06s
tts_Yoruba_dataset_14b_188clips_2636s_20260415-2147	188	40m 35s
tts_Yoruba_dataset_15_11clips_127s_20260415-1652	11	2m 04s
GRAND TOTAL	1,942	5h 50m 39s

Structure

Each folder in the dataset contains:

A collection of audio files in MP3 format
A tab-separated mapping file linking each audio file to its transcription

Each line in the mapping file follows the format:

audio_filename.mp3 key sentence attempts

The dataset is designed for TTS pipelines requiring paired audio-text data.

Sample

e6503d0affe22118b3bc14a1aa244684.mp3 | Kini o fe so nipa ikowoje ti ko mo eya mo ni orile-ede yii?
9e477f34a0ba6f7196d62c6333af1161.mp3 | Bawo ni yoo se maa wo idagbasoke ton ba apo awon oludari apa ariwa Naijiria
4916b1a822addabf960de77e3e970c37.mp3 | Ojuse wa gege bi ara ilu ni lati maa wa igbe aye to daade fun ara wa laini ro bi awon oloseluse se ba wa tan.
936c2fbdd7112457e7cbc16f9486db15.mp3 | Adupe pupo lowo gbogbo awon ti won npe wa ni ti gidi lati fi erongba won han lori awon ohun ti a n ko ninu iwe Iroyin-Owuro.
48f3966dd24f7ca54d4b0cbc4282c8a3.mp3 | Ilu Eko tii n jo oun to dorikodo bayi.
f0bea3568f28ae9b6ec7c5a7ab5d1cf7.mp3 | Ojoojumo ni ona wa n baje si ni ipinle Eko, gbogbo re dake roro bi enipe ko si olori ni aganyin.
3779984145d73507bcd8c14b09fd7179.mp3 | Oun si lo ye gbogbo awa olugbe Afrika, awa omo Naijiria.
22df44c7cea8408f23e16e3e035b5032.mp3 | Igbe aye to dun, to layo, to si larinrin laisi ifoya dun pupo lati maa gbe.

Description

Specifics

Considerations

Processes

Metadata