Task: ASR
Release Date: 7/2/2026
Format: MP3, TSV
Size: 2.11 GB
Share
This dataset is a subset of the larger Spanish Mozilla Common Voice Scripted Speech 26.0 dataset. It contains only validated audio (at least one upvote and 0 downvotes) from users who's self-identified accent is from Mexico and who self-identify as male.
Restrictions/Special Constraints
N/A
Forbidden Usage
You agree not to attempt to determine the identity of speakers in this dataset. You agree not to rehost this dataset.
Intended Use
Speech technology
This dataset was created by filtering the train, dev, and test files from the v26 Spanish Mozilla Common Voice Scripted Speech dataset with the following conditions:
the value of the up_votes field is a number greater than 0.
the value of the down_votes field is 0.
the value of the gender column is "male_masculine"
the value of the accents column includes one of the following:
{'Acento del centro de México, mejor conocido como Chilango',
'Acento mexico latino del centro del',
'CDMX, México',
'Cdmx',
'Ciudad de Mexico sin entonación',
'Ciudad de México',
'De la ciudad de México ',
'Español de México',
'Mexicano',
'Mexicano central',
'Mexico City',
'Mexico: Aguascalientes ',
'Mexico: Ciudad de Mexico',
'México',
'México centro',
'México. Centro del país. Aguascalientes, Ags.',
'México: Centro (Ciudad de México)',
'Norte de Mexico',
'North Mexico',
'Oaxaca, México ',
'San Luis Potosí ',
'español latino México ',
'mexico city accent'}
The resulting dataset includes 75,386 clips in the training set, 456 clips in the dev set, and 422 clips in the test set, totaling approximately 103 hours of audio.
The dataset follows the Mozilla Common Voice format: The clips directory contains all of the .mp3 files, and there is a separate tsv file for each data partition, containing the following fields:
client_id
path
sentence_id
sentence
sentence_domain
up_votes
down_votes
age
gender
accents
variant
localesegment
| Category | Train (n) | Train (%) | Dev (n) | Dev (%) | Test (n) | Test (%) |
|---|---|---|---|---|---|---|
| Teens | 6,379 | 8% | 39 | 9% | 36 | 9% |
| Twenties | 63,599 | 84% | 242 | 53% | 208 | 49% |
| Thirties | 2,415 | 3% | 110 | 24% | 94 | 22% |
| Fourties | 1,691 | 2% | 54 | 12% | 52 | 12% |
| Fifties | 1,221 | 2% | 10 | 2% | 25 | 6% |
| Sixties | 15 | 0% | 0 | 0% | 5 | 1% |
| Seventies | 64 | 0% | 0 | 0% | 0 | 0% |
| Eighties | 0 | 0% | 0 | 0% | 1 | 0% |
| Nineties | 0 | 0% | 0 | 0% | 0 | 0% |
| Total | 75,384 | 100% | 455 | 100% | 421 | 100% |