Task: OTH
Release Date: 6/5/2026
Format: MP4, SRT, JSON
Size: 40.31 GB
Share
The Brazilian Sign Language Health Dataset is a large-scale multimodal corpus developed to advance inclusive AI and accessibility research for Deaf communities in Brazil, with a focus on healthcare communication, including medical terminology, symptoms, preventive care, mental health, and clinical interactions, while maintaining an intersectional and gender-diverse approach. The dataset contains approximately 24.6 hours of Brazilian Sign Language (Libras) recordings collected across 44 participants and 16 structured recording sessions. The corpus includes more than 8,000 annotated video segments aligned with subtitles, gloss annotations, elicitation prompts, and metadata tables designed for multimodal AI research workflows and combines segmented video recordings, subtitle files, prompt metadata, technical recording metadata, and linguistic annotations, and is designed to support research and innovation in Automatic Sign Language Recognition (ASLR), Sign Language Translation (SLT), and multimodal AI systems, while also enabling downstream tasks such as subtitle alignment, gloss prediction, gesture segmentation, temporal localisation, and non-manual feature analysis.
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
Restrictions/Special Constraints
This version of the dataset is intended solely for research, educational, and non-commercial development purposes. Any commercial use, redistribution, or deployment in commercial products or services requires prior authorisation from the dataset owners.
Forbidden Usage
The dataset must not be used for deepfake generation, deceptive or manipulative content, discriminatory systems, or any other unethical or harmful applications of AI. Any use that may stigmatise, exploit, misrepresent, or negatively impact Deaf communities or dataset participants is strictly prohibitted.
Ethical Review
The dataset was curated following an inclusive and community-oriented framework prioritising accessibility, participant consent, and responsible AI development, with emphasis on gender-inclusive participation and accessibility-centered design. Contributors voluntarily participated in structured recording sessions and provided informed consent for the public research use of their recordings and annotations, while personally identifiable information unrelated to the research objectives was excluded from the released metadata. The dataset also underwent review by the Council for Orientation of Development and Ethics (CODE), an independent advisory body focused on ethical, legal, and societal considerations related to multimodal data.
Intended Use
Support research and development in Automatic Sign Language Recognition (ASLR), Sign Language Translation (SLT), multimodal AI systems, accessible healthcare technologies, human-computer interaction for Deaf communities, and linguistic and sociolinguistic research on Brazilian Sign Language (Libras). Contribute to healthcare accessibility, public health communication, and the development of educational and assistive technologies. Support downstream tasks such as subtitle alignment, gloss prediction, gesture segmentation, temporal localisation, non-manual feature analysis, and multimodal representation learning.
The dataset is organised into standardised folders and structured metadata tables to support multimodal research workflows and reproducible experimentation. It includes MP4 video recordings, Portuguese subtitle files aligned with video segments, segment-level annotation tables, elicitation guides, participant and recording metadata, and gloss annotations with expected gloss references. All participants in the dataset are anonymised, and no personally identifiable information is included. Researchers using the dataset are responsible for complying with applicable ethical and legal requirements related to the use of human participant data.
All participants in the dataset are anonymised, and no personally identifiable information is included. Researchers using the dataset are responsible for complying with applicable ethical and legal requirements related to the use of human participant data.
dataset/
│
├── README.md
├── metadata.json
├── README_POR.md
├── metadata_POR.json
│
├── annotations/
│ ├── segments.csv
│ ├── segments_extra.csv
│ ├── questions.csv
│ ├── questions_extra.csv
│ ├── recordings.csv
│ └── participants.csv
│
├── videos/
│ ├── S01/
│ ├── S02/
│ └── ...
│
├── subtitles/
│ ├── S01/
│ ├── S02/
│ └── ...
The videos/ directory contains the original interview recordings organised by session and participant.
Example:
videos/S01/P01/P01_S01.mp4
Some recordings are divided into multiple parts:
videos/S07/P29/P29_S07_part01.mp4
videos/S07/P29/P29_S07_part02.mp4
The subtitles/ directory contains subtitle files generated from interview recordings.
Example:
subtitles/S01/P01/P01_S01.srt
Subtitle timestamps may not perfectly align with sign boundaries because spoken/written language timing differs from sign articulation timing.
segments.csvContains segment-level annotations for the main elicitation protocol.
Each row corresponds to a segmented sign production or response interval.
Main fields include:
| Column | Description |
|---|---|
| segment_id | Unique segment identifier |
| session_id | Recording session identifier |
| participant_id | Anonymous participant identifier |
| video_file | Source video filename |
| question_id | Prompt identifier |
| start_time | Segment start time in seconds |
| end_time | Segment end time in seconds |
| duration | Segment duration |
| prompt | Prompt text |
| section | Semantic domain |
| prompt_type | Lexical or open-ended |
| mode | Controlled, guided, or free |
| expected_gloss | Target elicited gloss |
| produced_keywords | Keywords observed in the response |
| non_manual_features | Facial/body non-manual annotations |
| signing_quality | Annotation confidence/quality |
| comments | Additional notes |
segments_extra.csvContains annotations for supplementary or extended elicitation prompts.
questions.csvContains the primary elicitation question guide used during recordings.
questions_extra.csvContains supplementary elicitation prompts and extra lexical items.
recordings.csvContains technical metadata for all recordings.
Fields include:
FPS
Resolution
Codec
Duration
File size
Audio presence
Recording quality tier
Recording parts
participants.csvContains anonymous participant-level metadata when available.
No personally identifiable information is included.
The corpus includes recordings collected under heterogeneous conditions, including:
In-person recordings
Online recordings
Webcam recordings
Mobile recordings
Video quality varies across sessions with differences in:
Resolution
Frame rate
Compression
Lighting
Internet stability
Camera positioning
This variability reflects realistic communication conditions and may support robustness research in sign language recognition systems.
Videos were manually segmented into lexical productions and elicited responses.
Segments containing skipped prompts, unclear productions, and/or unusable recordings were excluded during quality control.
The dataset primarily contains expected gloss annotations corresponding to elicited target concepts.
These annotations may not always represent exact produced glosses because participants occasionally:
produced explanations instead of isolated lexical signs
used alternative lexical variants
expanded responses semantically
Therefore, gloss annotations should be interpreted as elicitation targets rather than fully verified production glosses.
Some segments include annotations describing non-manual features such as:
pain expressions
discomfort expressions
emphasis
affective markers
Example annotations:
pain face
disgust face
low energy face
The dataset includes recordings with:
| Property | Values |
|---|---|
| Resolution | 1920x1080, 1280x720, 640x360 |
| FPS | 23.98, 24, 25, 29.97, 59.94 |
| Codec | H.264 |
| Format | MP4 |
The dataset includes some limitations that should be considered during use and analysis. Subtitle timestamps may not perfectly align with sign boundaries, and expected gloss annotations may differ from the exact signs produced by participants. Some recordings contain internet artifacts or compression noise, and recording quality may vary across sessions. In addition, certain participants produced descriptive responses rather than isolated lexical concepts, reflecting natural variation in communication styles.