License:
CC-BY-4.0
Steward:
CommunityTask: MT
Release Date: 4/16/2026
Format: TXT
Size: 39.92 KB
Share
Ladino-Spanish Lexical Resources is a collection of four lexical files for Ladino (Judeo-Spanish) and Spanish, compiled by Col·lectivaT and the Sephardic Center of Istanbul for use in a rule-based Spanish-to-Ladino machine translation system. The files include a Spanish–Ladino phrase dictionary (136 entries) digitized from the printed dictionary "Diksionaryo de Ladino a Espanyol" by Güler, Portal i Tinoco, a Spanish–Ladino word list (3,884 entries), a list of irregular Spanish verbs (1,299 entries), and a Spanish–Ladino conjugated verb pairs list (2,379 entries). All files are plain text with semicolon-separated pairs. The resources were created as part of the project "Judeo-Spanish: Connecting the two ends of the Mediterranean", funded by the European Union and the Ministry of Culture and Tourism of the Republic of Turkey under the CCH-II grant scheme.
Licensing
Creative Commons Attribution 4.0 International (CC-BY-4.0)
https://spdx.org/licenses/CC-BY-4.0.htmlRestrictions/Special Constraints
Attribution required. The dictionary content may not be republished in non-dataset forms (e.g., as a printed or electronic book or dictionary) without permission from the original authors.
Forbidden Usage
Use without attribution to the original creators is not permitted.
Ethical Review
Data digitized from a published printed dictionary with open license. Compiled into dataset form by Col·lectivaT as part of a funded language preservation project.
Intended Use
Rule-based and neural machine translation development, lexicographic research, and NLP resource development for Ladino (Judeo-Spanish).
These lexical files were compiled by Col·lectivaT from the printed dictionary Diksionaryo de Ladino a Espanyol by Güler, Portal i Tinoco, and are used in the Spanish-Ladino translator app (source code).
For more datasets published within this initiative check Ladino Data Hub.
If you use this data, please cite:
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish
@inproceedings{oktem-etal-2022-preparing,
title = "Preparing an endangered language for the digital age: The Case of {J}udeo-{S}panish",
author = {{\"{O}}ktem, Alp and
Zevallos, Rodolfo and
Moslem, Yasmin and
{\"{O}}zt{\"u}rk, {\"{O}}zg{\"u}r G{\"u}ne{\c{s}} and
Gerson {\c{S}}arhon, Karen},
editor = "Ojha, Atul Kr. and
Ahmadi, Sina and
Liu, Chao-Hong and
McCrae, John P.",
booktitle = "Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference",
m jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.eurali-1.18/",
pages = "105--110",
}
This dataset is created as part of project "Judeo-Spanish: Connecting the two ends of the Mediterranean" carried out by Col·lectivaT and Sephardic Center of Istanbul within the framework of the "Grant Scheme for Common Cultural Heritage: Preservation and Dialogue between Turkey and the EU–II (CCH-II)" implemented by the Ministry of Culture and Tourism of the Republic of Turkey with the financial support of the European Union. The content of this website is the sole responsibility of Col·lectivaT and does not necessarily reflect the views of the European Union.