License:
CC-BY-NC-SA-4.0
Steward:
CommunityDataset ID:
cmr0mhm4l01agns077lq6a5hf
Task: TTS
Release Date: 6/30/2026
Format: WAV, WEBM, TSV, TGZ
Size: 328.77 MB
Share
A single-speaker read speech dataset in Ligurian. The dataset contains ~6 hours of pre-segmented utterances, recorded by an anonymous ~50-years-old female Ligurian native speaker. Sentences were prompted from a script. The speaker is native of Liguria region. The archive includes .webm audio files together with a metadata TSV with transcriptions, file paths and turations of each recording.
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
https://spdx.org/licenses/CC-BY-NC-SA-4.0.htmlRestrictions/Special Constraints
No restrictions
Forbidden Usage
No forbidden usage
Ethical Review
Participants are fully aware of the study's purpose. They have been instructed about the Mozilla Data Collective initiative.
Intended Use
This dataset is intended for use in creating automatic speech generation and recognition systems.
The speaker is a ~50-year-old anonymous woman from the Liguria region. She's a native speaker
The fragments are excerpts from the friulan version of Wikipedia
Failed attempts are collected in a separate folder. It is up to the speaker to decide whether an attempt was a failure.