Task: MT
Release Date: 6/9/2026
Format: CSV
Size: 89.27 KB
Share
This is an adapted and reorganized Meadow Mari-to-Russian translation data from AIRI's AI4TALK competition. It is intended for machine translation: each row contains Meadow Mari source text and a Russian translation. This task is text-based and does not include audio files. The original AI4TALK language code and recommended MDC source locale are both `mhr`. Meadow Mari, also known as Meadow-Eastern Mari, is a Uralic Mari language used mostly in European Russia and Mari El, with roughly 470,000 native speakers according to Wikipedia's current infobox.
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlRestrictions/Special Constraints
Users must comply with the Creative Commons Attribution-ShareAlike 4.0 International license terms, including attribution and share-alike.
Forbidden Usage
Users must not use the data to reveal or somehow decipher any personal information about the speakers or those who has contributed the source language data.
Ethical Review
The dataset was part of an already completed competition, raising no ethical problems at that time.
Intended Use
This dataset is intended for machine translation.
Technical summary: this package contains translation.csv with 2,473 text rows and no audio files. The CSV columns are id, lang, source, and translation. The license is Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Source and provenance: this is an adapted and reorganized subset of AI4TALK by the Artificial Intelligence Research Institute (AIRI), original URL https://github.com/AIRI-Institute/AI4TALK. The Meadow Mari material comes from the HSE/LingConLab Spoken Meadow Mari corpus; source data and corpus context are available at https://github.com/LingConLab/data_oral_meadow-mari_corpus and http://lingconlab.ru/spoken_meadow_mari/.
Transcription and conventions: the source column is in conventional Cyrillic orthography of Meadow Mari. Phonetic transcription rules are covered at http://lingconlab.ru/spoken_meadow_mari/.
Sample row:
id,lang,source,translation
14,mhr,"кок руш ӱдыр-шамыч улыт ыле,","две русские девушки были,"