Task: MT
Release Date: 5/15/2026
Format: MP3, TSV
Size: 225.93 MB
Share
CoVoST 2 is a large-scale multilingual speech to text translation corpus based on Mozilla Common Voice 4.0. This segment of the corpus contains the Mongolian audio (7 hours) and the translations in English.
Licensing
Creative Commons Attribution Non Commercial 4.0 International (CC-BY-NC-4.0)
https://spdx.org/licenses/CC-BY-NC-4.0.htmlRestrictions/Special Constraints
Research and non-commercial use only.
Forbidden Usage
You agree not to attempt to determine the identity of speakers in this dataset. You agree not to train models for public distribution on this dataset. Any attempt to clone the voice or train models that imitate the speakers in this dataset is forbidden.
Ethical Review
This dataset contains data from speakers who have asked to be removed from the Mozilla Common Voice dataset, we expect that you will treat it with care. We expect it to be used only for research and non-commercial purposes only.
Intended Use
Replication experiments involving the CoVoST 2 datasets.
This dataset contains 5590 audio clips totalling 07:38:57 of audio in Mongolian with the corresponding translations in English.
End-to-end speech-to-text translation (ST) has recently witnessed an increased interest given its system simplicity, lower inference latency and less compounding errors compared to cascaded ST (i.e. speech recognition + machine translation). End-to-end ST model training, however, is often hampered by the lack of parallel data. Thus, we created CoVoST, a large-scale multilingual ST corpus based on Common Voice, to foster ST research with the largest ever open dataset. Its latest version covers translations from English into 15 languages---Arabic, Catalan, Welsh, German, Estonian, Persian, Indonesian, Japanese, Latvian, Mongolian, Slovenian, Swedish, Tamil, Turkish, Chinese---and from 21 languages into English, including the 15 target languages as well as Spanish, French, Italian, Dutch, Portuguese, Russian. It has total 2,880 hours of speech and is diversified with 78K speakers.
path: Filename of the audio file
sentence: The sentence in the source language
translation: The sentence in the target language
client_id: The ID of the speaker of the source language, used for maintaining hygiene in the splits.
path sentence translation client_id
common_voice_mn_18724779.mp3 Том цагаан аварга загас нь томоохон сээр нуруутан амьтдаар хооллоно. This big white fish feeds on creatures with a prominent spine. c0e85a0234a072e77b7c11a98d4f063e52913f90343bee3c0a846d21e4af0cee3091b67cc47155158736599f931cc163beac47e29f2892ad53cf19d72ac022cb
common_voice_mn_18724780.mp3 Уг ёс нь түүний бичгийг шалгасан бол хамаг хэрэг тодорхой болохсон. If you could have checked his paper, it would have become certain already. c0e85a0234a072e77b7c11a98d4f063e52913f90343bee3c0a846d21e4af0cee3091b67cc47155158736599f931cc163beac47e29f2892ad53cf19d72ac022cb
common_voice_mn_18724781.mp3 Мемушай зүүн жигүүрээс ирсэн бөмбөгийг санаандаа хүртэл тогтоож авч чадаагүй ч довтолгоогоо үргэлжлүүллээ. Even if Memushaj didn’t complete his attempt from left side he continued to play. c0e85a0234a072e77b7c11a98d4f063e52913f90343bee3c0a846d21e4af0cee3091b67cc47155158736599f931cc163beac47e29f2892ad53cf19d72ac022cb
common_voice_mn_18724782.mp3 Аль нэгэн зарлиг захиаг буулгахад заавал гурван хүний хэлсэн адил байх хэрэгтэй. what the three people's said has to be similar to write down any order. c0e85a0234a072e77b7c11a98d4f063e52913f90343bee3c0a846d21e4af0cee3091b67cc47155158736599f931cc163beac47e29f2892ad53cf19d72ac022cb
| # clips | |
|---|---|
| Train | 2068 |
| Dev | 1762 |
| Test | 1760 |
If you use this dataset in your work please cite
@misc{wang2020covost,
title={CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus},
author={Changhan Wang and Anne Wu and Juan Pino},
year={2020},
eprint={2007.10310},
archivePrefix={arXiv},
primaryClass={cs.CL}
}