Task: ASR
Release Date: 5/12/2026
Format: WAV, TSV
Size: 126.98 MB
Share
32 minutes (307 utterances) of read speech of עברית (Hebrew), collected via the VoxForge project.
Licensing
GNU General Public License v3.0 or later (GPL-3.0)
https://spdx.org/licenses/GPL-3.0-or-later.htmlRestrictions/Special Constraints
N/A
Forbidden Usage
N/A
Ethical Review
Describe the ethical review process that was followed for this dataset, including any approvals or considerations related to data collection and usage.
Intended Use
ASR training and evaluation
Voice data contributed by volunteers who read prompts out loud. For עברית (Hebrew), there are 32 minutes of recorded speech.
The following is a breakdown of the number of utterances per speaker (at least 12 speakers):
| Name | Count |
|---|---|
| anonymous | 140 |
| EranGross | 40 |
| ethanoconnors | 40 |
| Nir | 10 |
| TalSasson | 10 |
| avia | 10 |
| gilevi74 | 10 |
| jazzman | 10 |
| stuk | 10 |
| tomerhanuni | 10 |
| yoavguez | 10 |
| alexey | 7 |
The top-level directory contains a number of subdirectories corresponding to speaker/session recorded. Each of these subdirectories is structured as follows:
├── wav/
│ ├── file1.wav
│ ├── file2.wav
│ ├── ...
├── etc/
│ ├── GPL_license.txt
│ ├── PROMPTS
│ ├── prompts-original
│ ├── README
where PROMPTS and prompts-original contain an audio id followed by a space and the prompt text (transcript).
See https://www.voxforge.org/home/about for more details about the project and dataset.