Task: MT
Release Date: 6/23/2026
Format: CSV
Size: 61.95 KB
Share
This is an adapted and reorganized Evenki-to-Russian translation data from AIRI's AI4TALK competition. It is intended for machine translation: each row contains Evenki source text (in conventional Evenki Cyrillic orthography) and a Russian translation. This task is text-based and does not include audio files. The original AI4TALK language code and recommended MDC source locale are both `evn`. Evenki is a Northern Tungusic language spoken in eastern Russia and China, with roughly 17,000 native speakers according to Wikipedia's current infobox.
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlRestrictions/Special Constraints
Users must comply with the Creative Commons Attribution-ShareAlike 4.0 International license terms, including attribution and share-alike.
Forbidden Usage
Users must not use the data to reveal or somehow decipher any personal information about the speakers or those who has contributed the source language data.
Ethical Review
The dataset was part of an already completed competition, raising no ethical problems at that time.
Intended Use
This dataset is intended for machine translation.
echnical summary: this package contains translation.csv with 1,392 text rows and no audio files. The CSV columns are id, lang, source, and translation. The license is Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Source and provenance: this is an adapted and reorganized subset of AI4TALK by the Artificial Intelligence Research Institute (AIRI), original URL https://github.com/AIRI-Institute/AI4TALK. The Evenki material comes from Institute of Linguistics RAS / Minority Languages of Russia expedition and corpus materials; corpus access and project context are available at https://gisly.net/corpus/ and https://minlang.iling-ran.ru/corpora/evenki.
Transcription and conventions: the source column is conventional Evenki Cyrillic orthography. Alphabet (37 letters): а б в г д е ё ж з и й к л м н ӈ о п р с т у ф х һ ц ч ш щ ъ ы ь э ю я ӣ ӯ; the Evenki-specific letters are ӈ (velar nasal), һ (/h/) and the long vowels ӣ ӯ (other long vowels are a base vowel + combining macron). The Russian translation column is standard Russian Cyrillic.
Sample row:
id,lang,source,translation
3,evn,горовэ-э нулгӣчэвун.,долго аргишили.