Task: NLP
Release Date: 6/10/2026
Format: TXT
Size: 19.87 MB
Share
This dataset is a curated Telugu text corpus developed to support research and development in Natural Language Processing (NLP) for the Telugu language. The corpus contains diverse textual content collected from multiple written sources, providing rich linguistic coverage suitable for various language technology tasks. The dataset can be used for applications such as language modeling, text classification, tokenization, named entity recognition, machine translation, information retrieval, and other NLP-related research. It is particularly valuable for improving resources and tools for low-resource Indic languages like Telugu. The text data has been organized and cleaned to facilitate academic research, benchmarking, and the development of AI systems focused on Telugu language understanding and generation. This dataset is intended to benefit researchers, students, and developers working on Indic language technologies and multilingual AI systems.
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
https://spdx.org/licenses/CC-BY-NC-SA-4.0.htmlRestrictions/Special Constraints
This dataset is intended primarily for research, educational, and scientific purposes in the field of Natural Language Processing and language technology development. Users must not use the dataset for unlawful, harmful, or discriminatory activities. Redistribution of the dataset without proper attribution is prohibited. Any use that violates applicable laws, privacy rights, or ethical AI practices is strictly forbidden.
Forbidden Usage
Any use of the dataset for generating harmful, misleading, or deceptive content is strictly prohibited.
Intended Use
This dataset is intended for developing and evaluating natural language processing models for Telugu text understanding and analysis.
Telugu (తెలుగు), also known as Classical Telugu or Standard Telugu, is a Dravidian language of the South-Central Dravidian branch. It is the official language of the Indian states of Andhra Pradesh and Telangana and is widely spoken across Karnataka, Tamil Nadu, Maharashtra, and among large diaspora communities in the United States, Malaysia, and the Gulf region. According to Glottolog, it belongs to the South-Central Dravidian group alongside Gondi and Konda. Telugu holds the distinction of being one of the classical languages of India, with a literary tradition dating back over a thousand years. Most speakers are bilingual in Hindi or English depending on their region and level of education.
Telugu Script
అ, ఆ, ఇ, ఈ, ఉ, ఊ, ఋ, ఎ, ఏ, ఐ, ఒ, ఓ, ఔ, అం, అః, క, ఖ, గ, ఘ, ఙ, చ, ఛ, జ, ఝ, ఞ, ట, ఠ, డ, ఢ, ణ, త, థ, ద, ధ, న, ప, ఫ, బ, భ, మ, య, ర, ల, వ, శ, ష, స, హ, ళ, క్ష, ఱ, ం, ః, ఁ
News Article: Formal journalistic articles covering regional and national news in Telugu.
News Blog: Informal blog-style news writing capturing current affairs, regional developments, and journalistic commentary.
Article Blog: General-purpose blog articles spanning a range of topics for a broad Telugu-speaking audience.
Spiritual Article Blog: Writing focused on spirituality, philosophy, and devotional themes rooted in Telugu cultural tradition.
Literature Blog: Literary writing including prose, narrative, and cultural commentary in the Telugu literary tradition.
| Field | Details |
|---|---|
| Dataset Name | Telugu Text Corpus |
| Language | Telugu (తెలుగు) |
| Language Family | Dravidian — South-Central Dravidian Branch |
| Number of Authors | 4 |
| Number of Domains | 6 (News Article, News Blog, Article Blog, Spiritual Article Blog, Literature Blog, Article Blog) |
| File Format | Plain Text (.txt) |
| Content Type | News, Spiritual Writing, Literary Prose |
Format: Plain Text (.txt)
| File Name | Author | Domain |
|---|---|---|
| 01-Telugu News Article Collection.txt | Nagabhushanam Boga | News Article/ News Blog |
| 02-Telugu Article Blog Collection.txt | Poduri Gopala Rao | Article Blog/ Spiritual Article Blog |
| 03-Telugu Literature Blog Collection.txt | Sarat Chandra | Literature Blog |
| 04-Telugu Article Blog Collection.txt | Velmajala Narasimha | Article Blog |
నిజంగా అభినందించాల్సిందే — తెలంగాణ రాష్ట్రంలో ఉన్న జైళ్ల లోని ఖైదీలకు వేసవి కాలంలో ఇచ్చే భోజనంలో పచ్చి పులుసు చేర్చడం నిజంగా అభినందనీయం.
ఆరోగ్య రీత్యా శరీరంలోని వేడికి ఉపశమనంగా ఇది అందించడం మెచ్చుకోదగినది.
ఆయుర్వేదం ప్రకారం పిత్త దోషాన్ని హరించే గుణమున్న ఈ పచ్చి పులుసును లంచ్, డిన్నర్ లో చేర్చి ఖైదీలకు ఇవ్వాలని తెలంగాణ ప్రభుత్వం నిర్ణయించింది.
ఇందులో ఉల్లిపాయలు కూడా ఉంటాయి కాబట్టి తీసుకున్న వారికి ఇది చలవ చేస్తుంది.
దీన్ని తయారు చేయడం కూడా ఎంతో సులభం. మొత్తం మీద ఇంత కాలానికి పచ్చి పులుసుకు అధికార హోదా లభించినట్లయింది.