License:
CC-BY-NC-SA-4.0
Steward:
CommunityDataset ID:
cmpwi0ac200zbo007hebmkzqy
Task: TTS
Release Date: 6/2/2026
Format: WEBM, TSV
Size: 282.84 MB
Share
The Gresik dialect is a variant of the Javanese language commonly spoken by the majority of people in Gresik city, East Java Province, Indonesia. This dialect is part of the ‘Arekan’ dialect group and features distinctive vocabulary that is uniquely used in Gresik, making it slightly different from the Javanese varieties spoken in the Surabaya area. This dataset is designed to represent the use of the Javanese language with Gresik dialect in daily life. A small subset of the dataset uses the formal register (Krama), while the majority uses the informal register (Ngoko) with instances of Indonesian and English code-mixing.
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
Restrictions/Special Constraints
This dataset is not for commercial use. This dataset is intended for research and education purposes only, and proper citation is a must. Any use of this dataset requires written permission by submitting an access request with a clear statement of your intended use.
Forbidden Usage
Using this dataset to clone or imitate specific speakers and train chatbots or large language models is strictly forbidden. Re-uploading or redistributing this dataset is prohibited.
Ethical Review
This dataset was created by writing texts using the Gresik dialect of Javanese with code-mixing in Indonesia and English. The texts were read and recorded by native speakers through the hosting platform (https://sabre-2.onrender.com/). All of the audio recording was compiled into a comprehensive dataset.
Intended Use
This dataset is intended to support the preservation and dissemination of the Javanese language, particularly the Gresik dialect spoken in Gresik city, East Java Province, Indonesia, for educational and cultural research purposes.
This dataset contains Javanese language with Gresik dialect used in daily activities and includes code-mixing with Indonesian and English.
This dataset was created by the owner of the dataset, considered as native speakers of this language.
This dataset consists of general domains such as culture, daily life activity, social interactions, personal experiences, social media usage, etc.
5 hours.
Nisa’, Rahmah Nabilatun. (2026). TTS Javanese-Gresik Dialect [Data set]. Mozilla Data Collective. URL [dataset link].
Audio file name, text.
“Nek ditakoni krasan ta gak, yo jelas krasan seru.”
“Arep maem katah nggih rasane mboten eneg.”
“Yo sing relate ambek realitas dunia ae.”
“Tambah tuwek, rasane sang awak tambah ringkih ae.”
“Aku langsung nutup laptop dan gak bakal nyentuh kerjoan maneh.”
Latin alphabet (A–Z), Arabic numerals (0–9).
For more information about the Gresik dialect, please visit wikipedia https://id.wikipedia.org/wiki/Bahasa_Jawa_Gresik . This dataset does not include the content of this link and license of this dataset does not apply to the wikipedia link.