License:
CC BY-NC-ND 4.0
Steward:
EELLAK - GreekFOSSTask: NLP
Release Date: 4/16/2026
Format: PARQUET
Size: 12.02 MB
Share
This dataset is a structured digital export of the Triantafyllides Modern Greek Dictionary (Λεξικό της Κοινής Νεοελληνικής — Dictionary of Standard Modern Greek), sourced from the (Greek Language Portal: https://www.greek-language.gr/greekLang/modern_greek/tools/lexica/triantafyllides/). It contains 46,745 entries covering the full Modern Greek lexicon. Each entry includes the headword (lemma), pronunciation, full dictionary text, and metadata such as page number and source URL. Also entries include: -Definitions -Grammatical information -Example phrases and idioms -Etymology -References to related words Metadata Info: page_no: (int) Page number in the original printed dictionary (1–4675) lemma: (string) The headword / entry word (e.g., α, α- 1, αβέβαιος, …) pronunciation: (string) Phonetic transcription (e.g., álfa, á) entry_text: (string) Full dictionary entry text in Greek, including definitions, examples, usage notes, etymology, and cross-references source_url: (string) URL of the original page on greek-language.gr
Licensing
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.enRestrictions/Special Constraints
CC BY-NC-ND 4.0: non-commercial only; no derivatives; attribution required; research/edu use;
Forbidden Usage
Any commercial use, redistribution, or derivative works without authorization from the original copyright holders is prohibited.
Ethical Review
This dataset contains content derived from copyrighted sources that are not owned by the dataset creators. The data has been collected and processed from lawfully accessed materials and structured for research purposes in language technology. The decision to include this dataset is based on the understanding that such use falls within applicable copyright frameworks for research. No ownership over the original content is claimed. To ensure responsible use: The dataset is released under a CC BY-NC-ND 4.0 license (non-commercial, no derivatives) Users are explicitly informed that any use beyond research may require permission from the original rights holders The dataset is distributed via controlled platforms to support appropriate governance
Intended Use
This dataset is intended for non-commercial research in natural language processing, linguistics, and Greek-language AI. Example applications include: Language modeling experiments Lexical and semantic analysis Evaluation of NLP systems Academic research and benchmarking Use of this dataset for training machine learning models (including large language models) is permitted only within a research context. Any commercial use, or use beyond research purposes, requires appropriate authorization from the original content rights holders. The dataset creators do not grant such rights.