License:
CC-BY-NC-SA-4.0
Steward:
CommunityDataset ID:
cmr0mlgjp01bimk07xkdw256s
Task: TTS
Release Date: 6/30/2026
Format: WEBM, TSV, TXT, WAV
Size: 1.53 GB
Share
A single-speaker read speech dataset in Neapolitan. The dataset contains ~8 hours of pre-segmented utterances, recorded by an anonymous ~25-years-old male Neapolitan native speaker. Sentences were prompted from a script. The speaker is native of Campania region. The archive includes .webm audio files together with a metadata TSV with transcriptions, file paths and turations of each recording.
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
https://spdx.org/licenses/CC-BY-NC-SA-4.0.htmlRestrictions/Special Constraints
No restrictions
Forbidden Usage
No forbidden usage
Ethical Review
Participants are fully aware of the study's purpose. They have been instructed about the Mozilla Data Collective initiative.
Intended Use
This dataset is intended for use in creating automatic speech generation and recognition systems.
The speaker is a ~25-year-old anonymous man from the Campania region. He0s a native speaker
The fragments are excerpts from the Neapolitan version of Wikipedia
Failed attempts are collected in a separate folder. It is up to the speaker to decide whether an attempt was a failure.