Task: CV
Release Date: 5/13/2026
Format: JPEG, JSON
Size: 190.44 MB
Share
This dataset consists of traffic sign images collected from different regions of India, including signs in English and regional languages. It covers a wide range of categories such as regulatory, warning, and informational signs, reflecting real-world road environments. The images are captured under diverse conditions, including varying lighting, weather, angles, and occlusions, making the dataset suitable for robust model development. This dataset is designed to support tasks such as image classification, object detection, and multilingual traffic sign recognition, and can be used for research and applications in computer vision and intelligent transportation systems.
Licensing
Creative Commons Attribution No Derivatives 4.0 International (CC-BY-ND-4.0)
Restrictions/Special Constraints
NA
Forbidden Usage
NA
Intended Use
For training and evaluating computer vision models for traffic sign detection, classification, and multilingual recognition.
The dataset was collected from multiple urban and rural regions across India, capturing real-world traffic scenarios in their natural setting. Images reflect a wide range of variables including camera device type, shooting distance, and perspective angle, ensuring the dataset accurately mirrors the diversity and complexity of on-ground traffic environments across different regions.
Standard preprocessing steps have been applied, including image resizing and normalization. Importantly, some images intentionally retain natural noise, motion blur, and occlusions to preserve the realism of traffic scenarios and improve model generalization in practical real-world deployments.
A distinguishing feature of this dataset is the inclusion of traffic signs in both English and regional Indian languages within the same collection. This multilingual characteristic makes it particularly valuable for multilingual sign recognition, regional language OCR, and cross-lingual traffic sign analysis tasks.
The dataset encompasses a broad range of environmental and lighting conditions, including daytime, nighttime, varying weather scenarios, and complex backgrounds. This diversity is intentional, designed to enhance model robustness and ensure reliable performance across real-world deployment conditions in both urban and rural settings.
This dataset is well-suited for a range of computer vision tasks, including:
Image classification and object detection
Traffic sign recognition and localization
Multilingual and regional language OCR on traffic signage
Training and benchmarking AI models for intelligent transportation systems (ITS)
Autonomous driving and road scene understanding
Certain traffic sign categories may be underrepresented, leading to class imbalance across the dataset.
Users are advised to apply data augmentation or class-balancing techniques depending on the requirements of their specific use case.
Dataset performance may vary for edge cases involving heavily occluded, damaged, or regionally uncommon signage.
"info": {
"description": "CaLI captions dataset",
"version": "1.0",
"date_created": "2026/04/10"
},
"licenses": [],
"images": [
{
"id": 1,
"file_name": "images/ea06fef4-dc5a-4cc5-b9eb-b5a2d2d2573f.jpeg",
"width": 958,
"height": 1280
},
{
"id": 2,
"file_name": "images/285cf66e-8dd9-474d-beb0-f4156837f8bb.jpeg",
"width": 958,
"height": 1280
},
{
"id": 3,
"file_name": "images/3a237c54-39f2-494d-a127-a9fbf4c1d0fa.jpeg",
"width": 958,
"height": 1280
},
{
"id": 4,
"file_name": "images/12476665-32d0-4230-9343-f74cc03de747.jpeg",
"width": 958,
"height": 1280