License:
CC-BY-NC-SA-4.0
Steward:
CommunityDataset ID:
cmpo53l5b000pku07cem453ac
Task: TTS
Release Date: 5/27/2026
Format: WEBM, TSV
Size: 281.30 MB
Share
The Jombang dialect is part of the Arekan dialect (East Javanese) in East Java Province, Indonesia. However, this dialect has a unique position because it is the meeting point between the cultural influences between "Mataraman" (Solo-Yogyakarta) and "Arekan" (Surabaya-Malang) across Yogyakarta, Central Java, and East Java Province, Indonesia.
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
https://spdx.org/licenses/CC-BY-NC-SA-4.0.htmlRestrictions/Special Constraints
For research and scientific use, sources must be cited. For AI training purposes, please contact the dataset owner to request permission. Use of the data for any purpose that could have a negative impact is prohibited, including for commercial use.
Forbidden Usage
Not to attempt to determine the identity of speakers in the dataset, any attempt to clone the voice or train models that imitate the speakers in this dataset is forbidden, and it is forbidden to use this dataset to train chatbots or large language models.
Ethical Review
This dataset was created by writing texts in the Jombang dialect of Javanese with code-mixing in Indonesian and English. The files were read and recorded by a native speaker through the hosting platform https://sabre-2.onrender.com/. The collection of audio recordings was compiled into a comprehensive dataset.
Intended Use
This dataset is intended for creating an automatic speech recognition system and this dataset is provided for comparative linguistic studies between regional language dialects, especially Javanese in Indonesia.
Jombang dialect of Javanese spoken in East Java Province, Indonesia.
Native of Javanese speaker.
Created by the owner of the dataset, considered as a linguist and a native speaker.
General domain, including Tourism Travel, Social Interaction, Environment.
5 hours
Fitriyanti, Amalia Ilmi. (2026). Jombang Dialect-Javanese TTS [Data set]. Mozilla Data Collective. URL [dataset link].
Audio file name, text
“Biyen pas dolan nang bali iku dadi salah siji pengalamanku sing paling angel tak lalekno.”
“Aku lan konco-konco ugo akeh foto-foto kanggo kenang-kenangan.”
“Miturutku nek aku iso kuliah nang Belanda kuwi bakal dadi pengalaman sing luar biasa banget. “
“Dadine aku iso bebas dolan nang macem-macem panggonan wisata kalebu taman tulip sing wis suwe tak pengeni yaiku Taman Keukenhof nang Lisse.”
“Soale hasil fotone iso tak posting nang media sosial lan dadi kenangan sing iso didelok maneh kapan wae.”
Latin alphabet (A–Z), Arabic numerals (0–9)