License:
CC-BY-NC-4.0
Steward:
CommunityTask: TTS
Release Date: 4/30/2026
Format: WEBM, TSV
Size: 287.70 MB
Share
The Bugis language is one of the major regional languages spoken in South Sulawesi, Indonesia, particularly in areas such as Bone, Wajo, Soppeng, Sidrap, Sinjai, and Pare-pare. It consists of several dialects, including Bone, Wajo, Soppeng and Barru, with the Barru dialect showing distinctive lexical and phonological features. Bugis is used in both formal and informal communication and is often mixed with Indonesian and English through code-switching and code-mixing, especially in daily conversations and digital communication. This dataset focuses on the Barru dialect and includes both formal and informal registers. It can be used for linguistic analysis, sociolinguistic studies, dialect comparison, language preservation, and natural language processing (NLP).
Licensing
Creative Commons Attribution Non Commercial 4.0 International (CC-BY-NC-4.0)
https://spdx.org/licenses/CC-BY-NC-4.0.htmlRestrictions/Special Constraints
This dataset is limited to academic and industrial research and is strictly prohibited from being used for commercial implementation without the dataset owner approval. Prior written consent from the data owner is required to use this dataset. Please contact the administrator by submitting an access request with a clear statement of your intended use. For Industrial research purposes, appropriate compensation is accepted.
Forbidden Usage
The license of this dataset is CC-BY-NC, but the dataset users are allowed for commercial use and modification by submitting an access request with a clear statement of your intended use and getting approval.
Ethical Review
Texts were first written in Indonesia and then translated into Bugis by native speakers. The texts were then read aloud and recorded by native speakers of the Bugis language, specifically the Barru dialect, through the hosting platform https://sabre-2.onrender.com/. The collected audio recordings were compiled into a comprehensive dataset.
Intended Use
This dataset is intended for the preservation and documentation of the Bugis language, particularly the Barru dialect, as well as for linguistic research and development of regional languages in the digital domain.
This dataset contains the Bugis language, particularly the Barru dialect, commonly spoken in South Sulawesi, with code-switching and code-mixing involving Indonesian and English.
Created by the owner of the dataset, considered as linguist and native speaker.
The collection focuses on everyday Bugis language use across various topics such as daily life, education, social relationships, cultural practices, and communication in digital contexts.
5 hours
Darmas, Aridha. (2026). TTS Bugis - Barru Dialect: Language and Identity [Dataset]. Mozilla Data Collective. URL [Dataset Link]
Audio file name, sentence.
“Oto pribadi e naulle lebbi manyameng i nasaba biasana pada idi mi keluarga ta.”
“Cuaca sibawa kondisi laleng e to ipikkiri topa madeceng bere' amang jokka ta.”
“Narekko makurang kedo tokki naulle napancaji ale e magatti matekko na magampang malasa.”
“Cara ta pattentui siaga doi makessing i tiwi narekko elokki lokka-lokka irita to tu pole aga yola matu.”
“Pattasie parellu moto' ele' bere' naulle massappa bale sedding na de'ppa na maloppo bombang na tasi e.”
Latin alphabet (A–Z), Arabic numerals (0–9)
https://repositori.kemendikdasmen.go.id/35378/1/Kamus%20Bugis-Indonesia.pdf