License:
CC-BY-SA-4.0
Steward:
CommunityDataset ID:
cmiepnyu1001jo207hzslvb5m
Task: ASR
Release Date: 11/25/2025
Format: mp3
Size: 302.97 MB
Share
This dataset features discussions on modern media—including film, podcasts, and social media—and its connection to local customs and traditions. The conversations are primarily in Indonesian, with frequent code-switching between English and Javanese.
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlThis dataset is derived from the Homostoria podcast. It features conversations primarily conducted in Indonesian, with frequent code-switching between English and Javanese.
Bahasa Indonesia - Indonesian (id)
Global and local modern media discussions.
This dataset contains 11 hours of spontaneous speech within 16 audio files.
This dataset is transcribed with automatic transcription tool (Transkriptor) and reviewed manually by linguist native speakers.
Columns in the .tsv file contains the following information:
"audio file": the name of audio files
"start": time when speech begins
"end": time when speech begins
"text": speech transcriptions
Ya, secure lah.
Ya, at least secure misal kayak gitu.
Jadi mungkin pemaknaan gitu ya.
Mungkin yang kita bawa itu pemaknaan bahwa self-help ini nggak hanya hal-hal yang seperti itu gitu.
Tapi mungkin lebih luas gak sih? Kalau menurutmu gimana nih, Hans?