License:
CC-BY-SA-4.0
Steward:
CommunityDataset ID:
cmp3rmip800ozo30799h7wp6g
Task: TTS
Release Date: 5/13/2026
Format: WEBM, TSV, TXT
Size: 336.10 MB
Share
A single-speaker read speech dataset in Friulian. The dataset contains ~5 hours of pre-segmented utterances, recorded by an anonymous ~30-years-old female Friulian native speaker. Sentences were prompted from a script. The speaker is native of Friuli region. The archive includes .webm audio files together with a metadata TSV with transcriptions, file paths and turations of each recording.
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlRestrictions/Special Constraints
No restrictions
Forbidden Usage
No forbidden usage
Ethical Review
Participants are fully aware of the study's purpose. They have been instructed about the Mozilla Data Collective initiative.
Intended Use
This dataset is intended for use in creating automatic speech generation and recognition systems.
The speaker is a ~30-year-old anonymous woman from the Friuli region. She's a native speaker
The fragments are excerpts from the friulan version of Wikipedia
Failed attempts are collected in a separate folder. It is up to the speaker to decide whether an attempt was a failure.