Task: TTS
Release Date: 12/6/2025
Format: WEBM
Size: 72.38 MB
Share
Text to speech dataset for Slovak, female speaker, approximately 2 hours of read speech.
Forbidden Usage
You agree not to attempt to determine the identity of speakers in the dataset
Intended Use
Training and fine-tuning text-to-speech models
This dataset contains approximately 2 hours of scripted speech for Slovak (sk) from a single speaker.
Slovak is a West Slavic language, and serves as the official language of Slovakia.
There are no variants defined for this dataset.
The age and gender of the speaker was not reported. Dataset names may be gendered, but were assigned according to the speaker's preference only.
The text corpus comes from Piper Recording Studio, which extends Microsoft's samples TTS scripts for Azure.
Microsoft provides the following recommendations:
To use these example scripts for training, it's recommended that you should do the sanity check to make sure it matches what the voice talent actually speaks in the audio and normalize the text before uploading the data. For example, change '50%' to fifty percent and '$45' to forty-five dollars. Normalization should apply to the scripts that contain digits, symbols, abbreviations, date, and time.
Statistics for the text corpus:
Average/median characters per sentence: 64/59
Average/median words per sentence: 10.5/10
Slovak uses an extended Latin alphabet.
Standard alphabet:
Lowercase: a b c d e f g h i j k l m n o p r s t u v w x y z á ä é í ó ô ú ý č ď ĺ ľ ň ŕ š ť ž
Uppercase: A B C D E F G H I J K L M N O P R S T U V X Y Z Á Ú Č Ď Ľ Š Ť Ž
5 randomly selected sentences:
Podľa informácií od zložiek integrovaného záchranného systému sa situácia dá zvládnuť ich činnosťou a dostupnými technickými prostriedkami.
Možno keď sa usadím v práci.
Š V K v Banskej Bystrici je univerzálna vedecká knižnica sídliaca v Banskej Bystrici.
Hypotéky zdražujú v reakcii na krok E C B napriek tomu, že sa čakalo, že zvýšenie týchto sadzieb už je započítané v cenách na finančnom trhu.
Ani lekári vám už nepomôžu?
Audio was recorded online using Piper Recording Studio. No post-processing or validation was done to the text or audio.
A pre-trained Piper voice model is available for download.
If you would like to contribute your voice and have us train a Piper text-to-speech model, please contact us at voice@openhomefoundation.org
We would like to thank all contributors, as well as supporters of the Open Home Foundation.
This dataset is released under the Creative Commons Zero (CC-0) license. By downloading this data you agree to not determine the identity of speakers in the dataset.