Task: ASR
Release Date: 7/3/2026
Format: MP3, TSV
Size: 426.64 MB
Share
This dataset is a subset of the larger Spanish Mozilla Common Voice Scripted Speech 26.0 dataset. It contains only validated audio (at least one upvote and 0 downvotes) from users who's self-identified accent is Rioplatense (or specifically identified as from Argentina/Uruguay)
Restrictions/Special Constraints
N/A
Forbidden Usage
You agree not to attempt to determine the identity of speakers in this dataset.
Intended Use
Speech technology; Linguistics research
This dataset was created by filtering the train, dev, and test files from the v26 Spanish Mozilla Common Voice Scripted Speech dataset with the following conditions:
the value of the up_votes field is a number greater than 0.
the value of the down_votes field is 0.
the value of the gender column is "female_feminine"
the value of the accents column is one of the following:
{"Rioplatense: Argentina, Uruguay, este de Bolivia, Paraguay",
"Argentino porteño",
"Argentinean",
"Cordobés: Argentina",
"Argentina: Córdoba",
"Soy argentina, hablante de la variedad de español del Río de la Plata.",
"Argentina, acento entrerriano",
"Español (Argentina)",
"Español latinoamericano argentina",
"Argentino",
"Uruguayan",
"Rioplatense",
"Paysandù, Uruguay",
"Español de Argentina",
"Argentina: Del interior profundo de la provincia de Misiones.",
"Argentina: Cordobés (Córdoba Capital) (Clase Media)",
"Argentina: Catamarca - Norte Argentino. Catamarqueño."}
The resulting dataset includes 9,903 clips in the training set, 266 clips in the dev set, and 224 clips in the test set, totaling approximately 16 hours of audio.
The dataset follows the Mozilla Common Voice format: The clips directory contains all of the .mp3 files, and there is a separate tsv file for each data partition, containing the following fields:
client_id
path
sentence_id
sentence
sentence_domain
up_votes
down_votes
age
gender
accents
variant
localesegment
| Category | Train (n) | Train (%) | Dev (n) | Dev (%) | Test (n) | Test (%) |
|---|---|---|---|---|---|---|
| Not Specified | 74 | 1% | 8 | 3% | 57 | 25% |
| Teens | 546 | 6% | 18 | 7% | 27 | 12% |
| Twenties | 5,245 | 53% | 107 | 40% | 58 | 26% |
| Thirties | 1,518 | 15% | 81 | 30% | 15 | 7% |
| Fourties | 893 | 9% | 34 | 13% | 33 | 15% |
| Fifties | 1,103 | 11% | 8 | 3% | 22 | 10% |
| Sixties | 524 | 5% | 7 | 3% | 10 | 4% |
| Seventies | 0 | 0% | 3 | 1% | 2 | 1% |
| Total | 9,903 | 100% | 266 | 100% | 224 | 100% |
Gender distribution:
| Category | Train (n) | Train (%) | Dev (n) | Dev (%) | Test (n) | Test (%) |
|---|---|---|---|---|---|---|
| Not specified | 190 | 2% | 8 | 3% | 63 | 28% |
| Female_Feminine | 2,549 | 26% | 76 | 29% | 21 | 9% |
| Male_Masculine | 7,164 | 72% | 182 | 68% | 140 | 62% |
| Total | 9,903 | 100% | 266 | 100% | 224 | 100% |