Task: ASR
Release Date: 1/21/2026
Format: CSV, MP3
Size: 1.92 GB
Share
This is a subset of Common Voice v24 English filtered for Australian-clustered accents. It is designed to be used in conjunction with the hands-on Tutorial delivered at Everything Open 2026 in Canberra, Australia.
Restrictions/Special Constraints
-
Forbidden Usage
It is forbidden to attempt to determine the identity of speakers in the Common Voice datasets. It is forbidden to re-host or re-share this dataset.
Ethical Review
This is a subset of Common Voice and the Common Voice collection process is documented at: https://commonvoice.mozillafoundation.org
Intended Use
This dataset is intended for use in fine-tuning automatic speech recognition systems to have better acoustic prediction on Australian English. This dataset does _not_ contain samples of **lexical** variation observed in Australian English.
Everything Open: https://2026.everythingopen.au
Tutorial overview: https://2026.everythingopen.au/schedule/presentation/6/
Tutorial GitHub repo: https://github.com/Mozilla-Data-Collective/tutorial-whisper-fine-tuning-australian-EO2026
This dataset was extracted from Common Voice v24 English by filtering on the accent field, after assessing the Australian-related accents in the dataset.
The duration of each clip was also calculated, to assist in identifying very long or short clips, and this is stored in ms in the field duration_ms.
audios => contains the audio files in the format id.mp3 where id is the unique identifier of the clip.
commonvoice-v24_en-AU.csv a CSV-formatted file.
The CSV fields are:
original row ID from Common Voice v24 English
client_id: unique identifier for each speaker
path: the filename of the audio file
sentence_id: a unique identifier for each written sentence
sentence_domain: a string description of the topic domain of the sentence (may be null)
up_votes: integer indicating how many up votes this clip has
down_votes: integer indicating how many down votes this clip has, allows for exclusion
age: age range of speaker, if provided (may be null)
gender: gender identify of speaker, if provided (may be null)
accents: accent descriptor
locale: ISO-639 locale (all samples in this dataset are en)
segment: not applicable to this dataset, included to provide interoperability
duration_ms: duration in milliseconds of the audio file, calculated using librosa
This dataset comprises 55673 rows of Australian-accented elicited (read) English speech.
The total length of time is approximately 4.68 minutes.
Australian English
General Australian
South Australia
Educated Australian Accent
Sydney - middle eastern seaboard Australian
Queenslandish