Task: MT
Release Date: 6/10/2026
Format: CSV
Size: 262.01 KB
Share
This is an adapted and reorganized Khakas-to-Russian translation data from AIRI's AI4TALK competition. It is intended for machine translation: each row contains Khakas source text and a Russian translation. This task is text-based and does not include audio files. The original AI4TALK language code and recommended MDC source locale are both `kjh`. Khakas is a South Siberian Turkic language spoken mainly in the Republic of Khakassia in Russia, with roughly 29,000 speakers according to Wikipedia's current infobox.
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlRestrictions/Special Constraints
Users must comply with the Creative Commons Attribution-ShareAlike 4.0 International license terms, including attribution and share-alike.
Forbidden Usage
Users must not use the data to reveal or somehow decipher any personal information about the speakers or those who has contributed the source language data.
Ethical Review
The dataset was part of an already completed competition, raising no ethical problems at that time.
Intended Use
This dataset is intended for machine translation.
Technical summary: this package contains translation.csv with 5,259 text rows and no audio files. The CSV columns are id, lang, source, and translation. The license is Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Source and provenance: this is an adapted and reorganized subset of AI4TALK by the Artificial Intelligence Research Institute (AIRI), original URL https://github.com/AIRI-Institute/AI4TALK. The Khakas material comes from the HSE/LingConLab Spoken corpus of the dialects of Khakas; source data and corpus context are available at https://github.com/LingConLab/data_oral_khakas_corpus/ and http://lingconlab.ru/spoken_khakas/.
Transcription and conventions: the source column is in conventional Cyrillic orthography of Khakas. Phonetic transcription rules are covered at http://lingconlab.ru/spoken_khakas/.
Sample row:
id,lang,source,translation
0,kjh," кӧрзем, ам хыйға кізілер чахсы чуртапча.","когда посмотрю, сейчас умные люди хорошо живут."