License:
GPL-3.0
Steward:
MDC Community ConciergeTask: ASR
Release Date: 5/8/2026
Format: WAV, TSV
Size: 143.71 MB
Share
39 minutes of Catalan read speech, collected as part of the VoxForge project.
Licensing
GNU General Public License v3.0 or later (GPL-3.0)
https://spdx.org/licenses/GPL-3.0-or-later.htmlRestrictions/Special Constraints
N/A
Forbidden Usage
N/A
Intended Use
ASR training and evaluation
Voice data contributed by volunteers who read prompts out loud. For Catalan, there is just over 1 hour of recorded speech.
The following is a breakdown of the number of utterances per speaker (of course, "anonymous" likely makes up multiple speakers):
| Speaker | Count |
|---|---|
| anonymous | 128 |
| duhow | 80 |
| Guillem | 60 |
| RainCT | 60 |
| Pere | 30 |
| hseara | 20 |
| rain | 20 |
| Kyngo | 10 |
| RogerR | 10 |
The top-level directory contains a number of subdirectories corresponding to speaker/session recorded. Each of these subdirectories is structured as follows:
├── wav/
│ ├── file1.wav
│ ├── file2.wav
│ ├── ...
├── etc/
│ ├── GPL_license.txt
│ ├── PROMPTS
│ ├── prompts-original
│ ├── README
where PROMPTS and prompts-original contain an audio id followed by a space and the prompt text (transcript).
See https://www.voxforge.org/home/about for more details about the project and dataset.