Task: NLP
Release Date: 1/27/2026
Format: Markdown (.md)
Size: 251.63 MB
Share
This dataset provides a comprehensive Corpus of Greek Digital Books systematically aggregated from OpenBook.gr. Since its inception in 2010, the OpenBook platform has functioned as a central hub for the Greek open-access movement. The corpus features a robust variety of genres and formats, specifically curated to include only legal, freely distributable content. It serves as a vital resource for Natural Language Processing (NLP), linguistic analysis, and the preservation of Greek digital heritage, ensuring that both historical public domain texts and modern creative works remain accessible for computational study.
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
https://spdx.org/licenses/CC-BY-NC-SA-4.0.htmlRestrictions/Special Constraints
Must comply with the license
Forbidden Usage
Non commercial use
This dataset is a comprehensive corpus of Greek digital books collected from OpenBook.gr.It includes a wide range of legally and freely distributable genres and formats, supporting NLP research, linguistic analysis, and the preservation of both historical and contemporary Greek digital literature.