License:
CC-BY-SA-4.0
Steward:
CommunityDataset ID:
cml9hmuis017yo407k0p4i0t4
Task: TTS
Release Date: 2/5/2026
Format: WEBM, TSV
Size: 309.99 MB
Share
Betawi TTS of Cultural Language (BEKAL) is a dataset that represents the Betawi language as a living and evolving language within the urban context of Jakarta, reflecting both traditional forms and modern variations that emerge in everyday communicative practices. This dataset can be utilized for linguistic research, cultural documentation, urban sociolinguistic studies, and the development of language technologies based on regional languages with Indonesian code-mixing.
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlRestrictions/Special Constraints
Please contact the dataset owner to request permission. For research use, proper citation is required and not to be distributed commercially.
Forbidden Usage
Re-uploading, modifying, or redistributing this dataset without the owner’s permission is prohibited.
Ethical Review
This dataset was created by writing texts in the Betawi language with code-mixing in Indonesian. The files were read and recorded by native speakers through the hosting platform https://sabre-2.onrender.com/. The collection of audio recordings was compiled into a comprehensive dataset.
Intended Use
This dataset is designed to document, map, and analyze the variety of Betawi language speech across various domains of community life.
This dataset uses the Bekasi dialect of the Betawi language from West Java, with urban Jakarta Indonesian code-mixing used by young people.
Created by the team of the dataset creator, considered as linguists and native speakers.
General domain, covering themes of culture, family, daily activities, education, and social interaction.
5,5 hours
Approximately 5,5 hours for TTS.
Audio file name, text
",,,makanye orang betawi sering dibilang gepyak dan semua tetanggenye dianggep sodara,,,"
",,,Orang kampung pade suka nandak-nandak ame nyawer kalo ade jaipongan,,,"
",,,Kadang-kadang ngeliatin orang yang lagi bebiakan,,,"
",,, kalian tepinin lantai rumeh kering,,,"
",,, kaye ngumpul aje di bale baringan maen hp sama ngopi kopi item,,,"
Latin alphabet (A–Z), Arabic numerals (0–9)
www.linkedin.com/in/ riska-legistari-febri-5aab98252