LibriVox Czech TTS Female Voice
License:
CC0-1.0
Steward:
MDC CuratorsTask: TTS
Release Date: 4/15/2026
Format: MP3, TXT, TSV
Size: 178.58 MB
Share
Description
2 hours of sentence-aligned speech/text from "Krysař" by Viktor Dyk, on LibriVox, containing over 1,500 sentences and nearly 16,000 words.
Specifics
Considerations
Restrictions/Special Constraints
N/A
Forbidden Usage
You agree not to attempt to determine the identity of the speaker in this dataset.
Processes
Intended Use
Training neural TTS acoustic models (e.g., FastSpeech, VITS, or similar architectures) Fine-tuning pre-trained multilingual TTS models for Czech Benchmarking Czech speech synthesis quality Linguistic research on Czech prosody and phonetics
Metadata
Datasheet: Krysař — Czech TTS Dataset
Dataset Overview
| Language | Czech (cs) |
| Source Text | Krysař by Viktor Dyk |
| Source Audio | LibriVox public domain recording (https://librivox.org/krysar-by-viktor-dyk/) |
| Speaker | Single female speaker |
| Alignment | Sentence-level |
| License | CC-0 |
The Language
Czech (český jazyk) is a West Slavic language of the Indo-European family, written in the Latin script with diacritical marks. It is the official language of the Czech Republic and is spoken by approximately 10–11 million people worldwide, with additional diaspora communities in Slovakia, Austria, and North America.
The Source Text
Title: Krysař (English: The Rat Catcher or The Pied Piper)
Author: Viktor Dyk (31 December 1877 – 14 May 1931)
First Published: 1915
Krysař is a short prose novella and one of the most celebrated works of Viktor Dyk, a leading figure of Czech symbolism and decadence. Loosely based on the German legend of the Pied Piper of Hamelin, the story follows a mysterious rat-catcher who arrives in the town of Hameln and becomes entangled in a doomed love affair. Dyk transforms the folk legend into a dark, allegorical meditation on love, betrayal, art, and collective moral failure. The language of Krysař reflects the literary Czech of the early twentieth century. It is fully intelligible to contemporary readers but carries a stylised, poetic register characteristic of the Czech Symbolist movement, with an elevated, often melancholic tone.
The text is in the public domain worldwide, as the author died in 1931 and more than 70 years have elapsed since his death.
The Source Audio
The audio was sourced from LibriVox, a volunteer-driven project founded in 2005 with the goal of recording all books in the public domain and making them freely available as audiobooks. All recordings are released into the public domain under the LibriVox license, meaning they may be freely used, distributed, and adapted for any purpose, including the creation of speech datasets. The recording used for this dataset features a single female reader.
Dataset Construction
Alignment Method
The dataset was constructed by sentence-aligning the source text with the LibriVox audio recording of Krysař. In this context, "sentence" is a best approximation using sentence-final punctuation. In order to obtain sentence-level alignments, the Montreal Forced Aligner was used to produce word-level alignments, which were then rolled up to the sentence level.
Preprocessing
The original audio contains a LibriVox introduction that is not represented in the source text. This was removed for each chapter.
The original mp3 files used a variable bitrate. To ensure compatibility and simplify data validation, they were converted to a constant bitrate (128 kb/s).
Parentheses and brackets were removed.
Newlines were replaced with single spaces, and sentences were split on sentence-final punctuation (
.,!,?).Dashes and quotation marks were removed.
Intended Uses
This dataset is intended for use in training, fine-tuning, and evaluating text-to-speech (TTS) systems for Czech. Potential applications include:
Training neural TTS acoustic models (e.g., FastSpeech, VITS, or similar architectures)
Fine-tuning pre-trained multilingual TTS models for Czech
Benchmarking Czech speech synthesis quality
Linguistic research on Czech prosody and phonetics
Limitations and Biases
Single speaker: The dataset contains audio from a single LibriVox volunteer reader. Speaker diversity is therefore absent.
Gender: All audio is from a single female speaker. Models trained on this data will not generalize to male voices without additional data or adaptation.
Register: The language is formal and literary, which may limit the naturalness of TTS output in everyday contexts.
Domain: The dataset covers a single novella from a specific genre (allegorical fiction), limiting topical and stylistic diversity.
Further Reading
Dyk, V. (1915). Krysař. Prague.
LibriVox: https://librivox.org