License:
GPL-3.0
Steward:
MDC Community ConciergeTask: ASR
Release Date: 5/7/2026
Format: WAV, TSV
Size: 234.06 MB
Share
1 hour of read speech and transcriptions in Bulgarian.
Licensing
GNU General Public License v3.0 or later (GPL-3.0)
https://spdx.org/licenses/GPL-3.0-or-later.htmlRestrictions/Special Constraints
N/A
Forbidden Usage
N/A
Voice data contributed by volunteers who read prompts out loud. For Bulgarian, there is just over 1 hour of recorded speech.
The following is a breakdown of the number of utterances per speaker:
| Speaker | Count |
|---|---|
| anonymous | 308 |
| FF | 30 |
| Vlad_Cepesh | 30 |
| kvabakoma | 20 |
| Adi | 10 |
| AlexGotev | 10 |
| BGGeorgi | 10 |
| Garo02 | 10 |
| Grountex | 10 |
| Ivo | 10 |
| dtd | 10 |
| ff | 10 |
| mary | 10 |
| tux_bg | 10 |
| wladi | 10 |
The top-level directory contains a number of subdirectories corresponding to speaker/session recorded. Each of these subdirectories is structured as follows:
├── wav/
│ ├── file1.wav
│ ├── file2.wav
│ ├── ...
├── etc/
│ ├── GPL_license.txt
│ ├── PROMPTS
│ ├── prompts-original
│ ├── README
where PROMPTS and prompts-original contain an audio id followed by a space and the prompt text (transcript).
See https://www.voxforge.org/home/about for more details about the project and dataset.