Task: ASR
Release Date: 5/12/2026
Format: WAV, TSV
Size: 58.20 MB
Share
16 minutes (157 utterances) of read speech of فارسی (Persian), collected via the VoxForge project.
Licensing
GNU General Public License v3.0 or later (GPL-3.0)
https://spdx.org/licenses/GPL-3.0-or-later.htmlRestrictions/Special Constraints
N/A
Forbidden Usage
N/A
Ethical Review
Describe the ethical review process that was followed for this dataset, including any approvals or considerations related to data collection and usage.
Intended Use
ASR training and evaluation
Voice data contributed by volunteers who read prompts out loud. For فارسی (Persian), there are 16 minutes of recorded speech.
The following is a breakdown of the number of utterances per speaker (at least 4 speakers):
| Name | Count |
|---|---|
| anonymous | 127 |
| PCRider | 10 |
| Sina | 10 |
| spin313 | 10 |
The top-level directory contains a number of subdirectories corresponding to speaker/session recorded. Each of these subdirectories is structured as follows:
├── wav/
│ ├── file1.wav
│ ├── file2.wav
│ ├── ...
├── etc/
│ ├── GPL_license.txt
│ ├── PROMPTS
│ ├── prompts-original
│ ├── README
where PROMPTS and prompts-original contain an audio id followed by a space and the prompt text (transcript).
See https://www.voxforge.org/home/about for more details about the project and dataset.