Common Voice v24 English - en-AU subset for Everything Open 2026

Description

This is a subset of Common Voice v24 English filtered for Australian-clustered accents. It is designed to be used in conjunction with the hands-on Tutorial delivered at Everything Open 2026 in Canberra, Australia.

Specifics

Licensing

Creative Commons Zero v1.0 Universal (CC0-1.0)

https://spdx.org/licenses/CC0-1.0.html

Considerations

Restrictions/Special Constraints

Forbidden Usage

It is forbidden to attempt to determine the identity of speakers in the Common Voice datasets. It is forbidden to re-host or re-share this dataset.

Processes

Ethical Review

This is a subset of Common Voice and the Common Voice collection process is documented at: https://commonvoice.mozillafoundation.org

Tutorial information

Everything Open: https://2026.everythingopen.au
Tutorial overview: https://2026.everythingopen.au/schedule/presentation/6/
Tutorial GitHub repo: https://github.com/Mozilla-Data-Collective/tutorial-whisper-fine-tuning-australian-EO2026

Preprocessing information

This dataset was extracted from Common Voice v24 English by filtering on the accent field, after assessing the Australian-related accents in the dataset.

The duration of each clip was also calculated, to assist in identifying very long or short clips, and this is stored in ms in the field duration_ms.

File structure

audios => contains the audio files in the format id.mp3 where id is the unique identifier of the clip.
commonvoice-v24_en-AU.csv a CSV-formatted file.

The CSV fields are:

original row ID from Common Voice v24 English
client_id: unique identifier for each speaker
path: the filename of the audio file
sentence_id: a unique identifier for each written sentence
sentence_domain: a string description of the topic domain of the sentence (may be null)
up_votes: integer indicating how many up votes this clip has
down_votes: integer indicating how many down votes this clip has, allows for exclusion
age: age range of speaker, if provided (may be null)
gender: gender identify of speaker, if provided (may be null)
accents: accent descriptor
locale: ISO-639 locale (all samples in this dataset are en)
segment: not applicable to this dataset, included to provide interoperability
duration_ms: duration in milliseconds of the audio file, calculated using librosa

Composition

This dataset comprises 55673 rows of Australian-accented elicited (read) English speech.

The total length of time is approximately 4.68 minutes.

Accents represented

Australian English
General Australian
South Australia
Educated Australian Accent
Sydney - middle eastern seaboard Australian
Queenslandish