Loading datasets...
Task: OTH
Release Date: 11/26/2025
Format: TSV
Size: 37.80 MB
Share
14.6 million tokens in the Sursilvan variety of Romansh from the daily newspaper “La Quotidiana”.
Sursilvan articles from the Romansh daily newspaper La Quotidiana between 1997 and 2008. The Sursilvan texts were automatically extracted from a mixed Romansh newspaper corpus using a Support Vector Machine trained on a smaller, manually labeled dataset.
To the extent possible under law, the newspaper’s publisher Somedia has waived all copyright and related or neighboring rights to this corpus. This work is published from Switzerland.
| Language variant | IETF BCP47 language code | Corpus size |
|---|---|---|
| Rumantsch Sursilvan | rm-sursilv | 14.6 million tokens |