License:
CC-BY-4.0
Steward:
Institut Teknologi dan Bisnis Asia MalangDataset ID:
cmpcp7cl800ydmg077wte52mv
Task: NLP
Release Date: 5/19/2026
Format: JSON, TXT
Size: 703.77 KB
Share
The JavLegends-NER dataset was developed to address the scarcity of labeled linguistic resources for Indonesian regional languages, specifically Javanese. It focuses on the domain of folklore and legends, which contains unique linguistic structures and cultural entities. The dataset comprises 100 documents extracted from sastra.org, a prominent digital repository for Javanese literature. Each document has been manually annotated by experts to identify six distinct entity types relevant to traditional narratives: PERSON: Names of characters, deities, or historical figures. LOCATION: Geographical sites, kingdoms, or mythical places. ORGANIZATION: Groups, kingdoms, or social entities. TIME: Temporal markers or durations within the story. EVENT: Significant occurrences, battles, or natural phenomena. LEGENDARY_OBJECT: Sacred heirlooms, magical weapons, or objects of cultural significance
Licensing
Creative Commons Attribution 4.0 International (CC-BY-4.0)
https://spdx.org/licenses/CC-BY-4.0.htmlRestrictions/Special Constraints
This dataset is intended strictly for academic, research, and scientific purposes. Any publication or product resulting from the use of this dataset must provide proper attribution by citing the original research paper
Forbidden Usage
1. Commercial redistribution or resale of the raw or modified dataset is strictly forbidden. 2. Using this dataset to train models that generate derogatory, hateful, or discriminatory content against Javanese culture or any specific group is prohibited. 3. Users must not misrepresent the data to create fake or misleading cultural and historical narratives.