Task: ASR
Release Date: 7/2/2026
Format: MP3, TSV
Size: 1.65 GB
Share
This dataset is a subset of the larger Spanish Mozilla Common Voice Scripted Speech 26.0 dataset. It contains only validated audio (at least one upvote and 0 downvotes) from users who's self-identified accent is from Mexico and who self-identify as female.
Restrictions/Special Constraints
NA
Forbidden Usage
You agree not to attempt to determine the identity of speakers in this dataset
Intended Use
Speech technology; linguistics research
This dataset was created by filtering the train, dev, and test files from the v26 Spanish Mozilla Common Voice Scripted Speech dataset with the following conditions:
the value of the up_votes field is a number greater than 0.
the value of the down_votes field is 0.
the value of the gender column is "female_feminine"
the value of the accents column is one of the following:
{'Acento del centro de México, mejor conocido como Chilango',
'Acento mexico latino del centro del',
'CDMX, México',
'Cdmx',
'Ciudad de Mexico sin entonación',
'Ciudad de México',
'De la ciudad de México ',
'Español de México',
'Mexicano',
'Mexicano central',
'Mexico City',
'Mexico: Aguascalientes ',
'Mexico: Ciudad de Mexico',
'México',
'México centro',
'México. Centro del país. Aguascalientes, Ags.',
'México: Centro (Ciudad de México)',
'Norte de Mexico',
'North Mexico',
'Oaxaca, México ',
'San Luis Potosí ',
'español latino México ',
'mexico city accent'}
The resulting dataset includes 59,560 clips in the training set, 158 clips in the dev set, and 186 clips in the test set, totaling approximately 84 hours of audio.
The dataset follows the Mozilla Common Voice format: The clips directory contains all of the .mp3 files, and there is a separate tsv file for each data partition, containing the following fields:
client_id
path
sentence_id
sentence
sentence_domain
up_votes
down_votes
age
gender
accents
variant
localesegment
| Category | Train (n) | Train (%) | Dev (n) | Dev (%) | Test (n) | Test (%) |
|---|---|---|---|---|---|---|
| Teens | 7,877 | 13% | 10 | 6% | 11 | 6% |
| Twenties | 36,878 | 62% | 85 | 54% | 133 | 72% |
| Thirties | 11,971 | 20% | 23 | 15% | 28 | 15% |
| Fourties | 1,049 | 2% | 19 | 12% | 10 | 5% |
| Fifties | 1,708 | 3% | 21 | 13% | 0 | 0% |
| Sixties | 77 | 0% | 0 | 0% | 4 | 2% |
| Seventies | 0 | 0% | 0 | 0% | 0 | 0% |
| Eighties | 0 | 0% | 0 | 0% | 0 | 0% |
| Nineties | 0 | 0% | 0 | 0% | 0 | 0% |
| Total | 59,560 | 100% | 158 | 100% | 186 | 100% |