Task: ASR
Release Date: 7/1/2026
Format: MP3, TSV
Size: 325.83 MB
Share
This dataset is a subset of the larger Spanish Mozilla Common Voice Scripted Speech 26.0 dataset. It contains only validated audio (at least one upvote and 0 downvotes) from users who's accent is labeled as Caribbean ("Caribe: Cuba, Venezuela, Puerto Rico, República Dominicana, Panamá, Colombia caribeña, México caribeño, Costa del golfo de México").
Restrictions/Special Constraints
N/A
Forbidden Usage
You agree not to attempt to determine the identity of speakers in this dataset.
Intended Use
Speech technology
This dataset was created by filtering the train, dev, and test files from the v26 Spanish Mozilla Common Voice dataset with the following conditions:
the value of the up_votes field is a number greater than 0.
the value of the down_votes field is 0.
the value of the accents column includes "'Caribe: Cuba, Venezuela, Puerto Rico, República Dominicana, Panamá, Colombia caribeña, México caribeño, Costa del golfo de México".
The resulting dataset includes 7,152 clips in the training set, 259 clips in the dev set, and 277 clips in the test set, totaling approximately 12.5 hours of audio.
The dataset follows the Mozilla Common Voice format: The clips directory contains all of the .mp3 files, and there is a separate tsv file for each data partition, containing the following fields:
client_id
path
sentence_id
sentence
sentence_domain
up_votes
down_votes
age
gender
accents
variant
localesegment
| Category | Train (n) | Train (%) | Dev (n) | Dev (%) | Test (n) | Test (%) |
|---|---|---|---|---|---|---|
| Not specified | 243 | 3% | 11 | 4% | 40 | 14% |
| Teens | 301 | 4% | 17 | 7% | 44 | 16% |
| Twenties | 3,693 | 52% | 115 | 44% | 88 | 32% |
| Thirties | 1,703 | 24% | 46 | 18% | 42 | 15% |
| Fourties | 761 | 11% | 18 | 7% | 39 | 14% |
| Fifties | 279 | 4% | 38 | 15% | 16 | 6% |
| Sixties | 129 | 2% | 14 | 5% | 6 | 2% |
| Seventies | 0 | 0% | 0 | 0% | 2 | 1% |
| Eighties | 0 | 0% | 0 | 0% | 0 | 0% |
| Nineties | 43 | 1% | 0 | 0% | 0 | 0% |
| Total | 7,152 | 100% | 259 | 100% | 277 | 100% |
| Category | Train (n) | Train (%) | Dev (n) | Dev (%) | Test (n) | Test (%) |
|---|---|---|---|---|---|---|
| Not specified | 256 | 4% | 32 | 12% | 50 | 18% |
| Male/Masculine | 5,490 | 77% | 158 | 61% | 195 | 70% |
| Female/Feminine | 1,406 | 20% | 69 | 27% | 32 | 12% |
| Total | 7,152 | 100% | 259 | 100% | 277 | 100% |