Sindhi Figurative Language Dataset (SiNFluD)

Description

The SiNFLuD is a human-labelled dataset developed to support research for the classification of figurative and literal language for low-resourceSindhi language. The dataset includes a diverse collection of idioms, metaphors, smilies, and proverbs collected from various web resources, representing culturally rich and context-dependent expressions. It is designed to enable computational models to move beyond literal meaning and capture deeper semantic and cultural interpretations in Sindhi text. This dataset is intended for training and evaluating NLP models for tasks such as figurative language classification, and semantic analysis. By providing structured annotations of idiomatic and metaphorical expressions, it helps bridge the resource gap in Sindhi language processing and supports the development of more culturally aware language technologies.

Sindhi Figurative Language Dataset (SiNFLuD)

Language

Sindhi (سنڌي) is an Indo-Aryan language spoken primarily in Pakistan and India. It has a strong literary and cultural tradition but remains a low-resource language in NLP, especially for figurative and semantic understanding tasks.

Script

Perso-Arabic Script (Sindhi) ا، ب، ٻ، ڀ، پ، ت، ٿ، ٽ، ٺ، ث، ج، ڄ، جھ، ڃ، چ، ڇ، ح، خ، د، ڌ، ڏ، ڊ، ڍ، ذ، ر، ڙ، ز، س، ش، ص، ض، ط، ظ، ع، غ، ف، ڦ، ق، ڪ، ک، گ، ڳ، ڱ، ل، م، ن، ڻ، و، ھ، ء، ي، ه

Dataset Structure

SiNFLuD-Dataset/
│
└── SiNFLuD

Metadata

Field	Details
Dataset Name	Sindhi Figurative Language Dataset
Language	Sindhi (سنڌي)
Language Family	Indo-European — Indo-Aryan Branch
ISO 639-1 / 639-3	`sd` / `snd`
Script	Perso-Arabic Script (Sindhi, Unicode)
Domain	Figurative Language / NLP
Task Type	Text Classification / Figurative Language Detection
Encoding	UTF-8
Format	JSON

Sample Text

{
  "text": ".اھو ڪي ڪجي جو مينهن وسندي ڪم اچي",
  "label_name": "non-literal",
   "type": "idiom"
},
{
 "text":"سائي کي سهي ڪو نه بکئي کي ڏئي ڪو نه.",
  "label_name":"non_literal",
   "type":"proverb"
},
{
"text":"ڪتي جا ڏند گڏھ جو ماس.",
 "label_name":"non_literal",
  "type":"metaphor"
}

Sindhi Figurative Language Dataset (SiNFluD)

Description

Specifics

Considerations

Processes

Metadata

Sindhi Figurative Language Dataset (SiNFLuD)

Language

Script

Dataset Structure

Metadata

Sample Text