Task: CV
Release Date: 5/13/2026
Format: JSON, JPEG
Size: 1.35 GB
Share
This dataset consists of traffic sign images collected from Bangladesh, capturing both Bengali signage commonly found on roads. It includes multiple categories such as regulatory, warning, and informational signs, reflecting real-world driving environments. The images are taken under diverse conditions, including different lighting, weather, angles, and levels of occlusion, making the dataset suitable for robust model training. This dataset is designed to support tasks such as image classification, object detection, and multilingual traffic sign recognition, and can be used for research and development in computer vision and intelligent transportation systems.
Licensing
Creative Commons Attribution Non Commercial 4.0 International (CC-BY-NC-4.0)
Restrictions/Special Constraints
This dataset is strictly intended for non-commercial use and may not be used for commercial purposes without prior permission.
Forbidden Usage
Use of this dataset for unlawful surveillance, harmful autonomous systems, or malicious activities is forbidden.
Intended Use
Intended for training, evaluation, and research of AI models for traffic sign detection, recognition, classification, and multilingual transportation-related vision systems.
The dataset was collected from diverse urban areas across Bangladesh, capturing real-world traffic conditions in their natural setting. Images reflect a wide range of variables including camera quality, shooting distance, and perspective angle, ensuring the dataset mirrors the complexity of on-ground traffic environments.
Standard preprocessing steps have been applied, including image resizing and normalization. Importantly, some images intentionally retain natural noise, motion blur, and occlusions to preserve the realism of traffic scenarios and improve model generalization in practical deployments.
A distinguishing feature of this dataset is the co-presence of both Bengali and English traffic signs within the same collection. This bilingual characteristic makes it particularly valuable for multilingual sign recognition tasks and reflects the real signage landscape of Bangladeshi urban infrastructure.
The dataset encompasses a broad range of environmental and lighting conditions, including daylight, nighttime, shadows, and varying weather scenarios. This diversity is intentional, designed to enhance model robustness and ensure reliable performance across real-world deployment conditions.
This dataset is well-suited for a range of computer vision tasks, including:
Image classification and object detection
Traffic sign recognition and localization
Training and benchmarking AI models for intelligent transportation systems (ITS)
Multilingual OCR and sign text extraction
Some traffic sign classes may be imbalanced, with rarer sign types having fewer representative samples.
Users are advised to apply data augmentation or class-balancing techniques depending on the requirements of their specific use case.
Dataset performance may vary for edge cases involving heavily occluded or damaged signage.
{
"info": {
"description": "CaLI captions dataset",
"version": "1.0",
"date_created": "2026/03/30"
},
"licenses": [],
"images": [
{
"id": 1,
"file_name": "images/2ee5ea1e-a6d9-44f4-b25b-9ea3550dfc33.jpg",
"width": 640,
"height": 640
},
{
"id": 2,
"file_name": "images/23d1beaf-e509-47fb-adc4-d15caa5439f6.jpg",
"width": 3000,
"height": 4000
},
{
"id": 3,
"file_name": "images/5cd4a9c5-8d8d-492c-bdbc-9d7048d45f97.jpg",
"width": 3000,
"height": 4000