Task: ASR
Release Date: 4/6/2026
Format: OGG, SRT
Size: 37.11 MB
Share
The Tamil Time-Aligned Speech Dataset is a curated 5-hour speech corpus consisting of Tamil audio recordings paired with precise time-aligned transcriptions. The dataset is designed to support a wide range of speech and language technology tasks, including automatic speech recognition, forced alignment, speech segmentation, subtitle generation, and timestamp-aware linguistic analysis. By preserving the correspondence between spoken audio and textual content at the segment level, the dataset enables detailed study of pronunciation, timing, and spoken language structure. It is a useful resource for researchers, developers, and institutions working on Tamil speech technologies and low-resource language processing.
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
https://spdx.org/licenses/CC-BY-NC-SA-4.0.htmlRestrictions/Special Constraints
No speaker identification, surveillance, or harmful use permitted.
Forbidden Usage
You agree not to attempt to determine the identity of speakers in this dataset.
Intended Use
This dataset is intended for use in creating automatic speech recognition systems.
Tamil is a major Dravidian language spoken primarily in Tamil Nadu, Sri Lanka, and Tamil-speaking communities around the world, with a long literary history and rich linguistic tradition.
அ ஆ இ ஈ உ ஊ எ ஏ ஐ ஒ ஓ ஔ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன ஜ ஷ ஸ ஹ க்ஷ ஶ ா ி ீ ு ூ ெ ே ை ொ ோ ௌ ் ஂ ஃ
Main folders: Audio/ and Time-Aligned_Transcripts/
Speaker-wise organization: each main folder contains Speaker_1/ and Speaker_2/
Audio folder: stores speech recordings for each speaker
Transcript folder: stores corresponding time-aligned transcript files for each speaker
Parallel structure: transcript files follow the same speaker-based organization as the audio files
Speaker 1: Age: 30, Gender: Male, Region: Neelagiri
Speaker 2: Age: 30, Gender: Female, Region: Chengalpattu
1 00:00:00,023 --> 00:00:04,020 நவீன வாழ்க்கையில தொழில்நுட்பத்தோட பங்கு
2 00:00:04,057 --> 00:00:05,855 அனைவருக்கும்
3 00:00:05,880 --> 00:00:10,620 நம்மளோட வந்து நவீன வாழ்க்கையில தொழில் நுட்பத்தோட பங்கு வந்து என்னனா
4 00:00:10,644 --> 00:00:15,153 இந்த உலகத்தில வந்து தொழில் நுட்பம் இல்லாம சிந்திக்க முடியாது அளவுக்கு வளர்ந்துடுச்சு