Ladino: Una Fraza al Diya

Description

"Una fraza al diya" (A Phrase a Day) is a Ladino language learning dataset prepared by Karen Sarhon of the Sephardic Center of Istanbul (SKAD). It consists of 307 sentences in Ladino (Judeo-Spanish) with parallel translations in Spanish, Turkish, and English. The sentences and images were originally published on SKAD's Instagram account (@sephardiccenteristanbul) and extracted using OCR. Audio recordings come from the accompanying web initiative (https://sefarad.com.tr/judeo-espanyolladino/frazadeldia/). The dataset was structured by Col·lectivaT as part of a project to support Ladino in the digital age.

Specifics

Licensing

Creative Commons Attribution 4.0 International (CC-BY-4.0)

https://spdx.org/licenses/CC-BY-4.0.html

Considerations

Restrictions/Special Constraints

Free to use for any purpose (commercial and non-commercial) with attribution to the original creators. Users must credit Karen Sarhon (Sephardic Center of Istanbul) and Col·lectivaT.

Forbidden Usage

Sentences originally published on the Instagram account of the Sephardic Center of Istanbul (@sephardiccenteristanbul). Text and images were extracted using OCR. Audio recordings are from the accompanying web initiative. The dataset was structured by Col·lectivaT as part of the project "Judeo-Spanish: Connecting the two ends of the Mediterranean". The package includes OGG audio files in clips/, JPEG images in images/, and a metadata.tsv covering all 307 entries (292 with audio, 304 with images). 15 entries are missing audio and 3 are missing images. Please check README.md for more information.

For more datasets published within this initiative check Ladino Data Hub.

Citation

If you use this dataset, please cite:

Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish

Preparing an endangered language for the digital age: The Case of Judeo-Spanish. Alp Öktem, Rodolfo Zevallos, Yasmin Moslem, Güneş Öztürk, Karen Şarhon. 
Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @  LREC 2022. Marseille, France. 20 June 2022

Disclaimer

This dataset was developed as part of project "Judeo-Spanish: Connecting the two ends of the Mediterranean" carried out by Col·lectivaT and Sephardic Center of Istanbul within the framework of the “Grant Scheme for Common Cultural Heritage: Preservation and Dialogue between Turkey and the EU–II (CCH-II)” implemented by the Ministry of Culture and Tourism of the Republic of Turkey with the financial support of the European Union. The content of this website is the sole responsibility of Col·lectivaT and does not necessarily reflect the views of the European Union.