Marma Text Corpus

Description

This dataset contains 5,675 sentences in the Marma language (ISO 639-3: rmz), a Tibeto-Burman language spoken primarily by the Marma people in Bangladesh and Myanmar. Each entry includes the original sentence and its normalized form, along with the source of the text. The data was compiled from various sources including textbooks, literature, poems, and linguist-authored sentences. The dataset is split into a training set (5,575 examples) and a test set (100 examples). It was created as part of a project by CLEAR Global funded by the Australian Government Department of Foreign Affairs and Trade (DFAT).

Specifics

Licensing

Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)

https://spdx.org/licenses/CC-BY-NC-SA-4.0.html

Considerations

Restrictions/Special Constraints

This dataset is intended solely for non-commercial research and educational purposes. Commercial use requires explicit permission from the original rights holders.

Forbidden Usage

Description

Specifics

Considerations

Processes

Metadata