License:
CC-BY-NC-4.0
Steward:
CommunityDataset ID:
cmpwi2f8y00y5nv06lptoxf6n
Task: TTS
Release Date: 6/2/2026
Format: WEBM, TSV
Size: 281.70 MB
Share
This speech corpus is part of projects to provide dataset in Mandar language by the native of Mandar community from West Sulawesi Province, Indonesia. This dataset is intended to support the preservation of regional languages, linguistic research, and the development of natural language processing (NLP) and speech recognition technologies based on local Indonesian languages.
Licensing
Creative Commons Attribution Non Commercial 4.0 International (CC-BY-NC-4.0)
https://spdx.org/licenses/CC-BY-NC-4.0.htmlRestrictions/Special Constraints
This dataset is for scientific use, including academic and industrial research purposes, and this dataset may not be used for commercial purposes.
Forbidden Usage
This dataset must not be used for any illegal purposes.
Ethical Review
There are three main stages in creating this dataset. This dataset begins with writing texts in Indonesian language and they are translated to Mandar language used in Polewali Mandar, West Sulawesi Province, Indonesia. The texts in Mandar language are read and recorded by native speakers from the Balanipa region. Lastly, all of the audio recording was compiled into a comprehensive dataset.
Intended Use
This dataset is intended for the preservation and documentation of the Mandar language, particularly the Mandar Balanipa dialect, as well as for linguistic research and the development of regional languages in the digital realm.
This dataset uses Mandar language with code-mixed in Indonesian and is spoken by natives of Balanipa dialect speakers.
Rahmat, Muhammad Abyan.(2026).Mandar Balanipa Speech Corpus [Dataset].Mozilla Data Collective.URL [Dataset link].
Created by the owner of the dataset, considered as a linguist and a native speaker.
This data consists of general topics or everyday conversations, including education, vacations, the environment, culture, social media, etc.
5 hours
Audio file name, text
“Mua’ na malimang lala a’ dzi wattu subuh pendai’ na allo”
“Arereee puang musanga a makkuliah dzio arsitek neee”
“Melo maissang contoh na, mappake'de tenda mau to mipelei anna to likka.”
“Inggannana pappegauang iya na paturu kindo kama u iya mo na simata u pogau bassa dzi te'e mappaccingngi kamar.“
“Andangi tu'u mala na natarrusang marondong na apa na mulai mi marondong acara na.”
Latin alphabet (A–Z), Arabic numerals (0–9)