License:
CC-BY-SA-4.0
Steward:
CommunityDataset ID:
cmqzgucgt009vmk070046obup
Task: CV
Release Date: 6/29/2026
Format: JPG, TSV
Size: 784.45 MB
Share
LezizNet is an openly-licensed image dataset of Turkish cuisine, sourced from Openverse (which aggregates Flickr, Wikimedia Commons, iNaturalist and others) and from Wikimedia Commons directly. It contains 3,272 images covering 245 distinct dish labels, each row carrying the original source URL, creator, and exact Creative Commons license for reproducible attribution. Some labels are multi-label (a plate of "kuru fasulye, pilav" is tagged with both), reflecting how Turkish meals are actually served. The dataset was built SEMI-AUTOMATICALLY: images were scraped and labels auto-derived (from titles/tags or the search query), then manually reviewed and cleaned. It was created by querying ~120 Turkish food terms, deduplicating, filtering out non-Turkish and non-food images through a two-stage manual + CLIP-based review (~40% of scraped images were rejected), and labeling each image from its title/tags, search query, visual inspection, or by hand. Because labels are partly automatic, LABELING ERRORS REMAIN — the `food_name_source` and `label_confidence` columns flag how each label was derived and how reliable it is. Corrections and contributions are welcome (contact below).
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
Restrictions/Special Constraints
Attribution to the original creators is required (per-image `creator` and `source_url` are provided). Share-alike applies: derivatives of CC-BY-SA images must be released under the same license. Individual images carry their own CC license — the per-image `license` field is authoritative and must be honoured.
Forbidden Usage
Use or redistribution without attribution to the original creators. Removing or violating the share-alike obligation on CC-BY-SA images. Claiming ownership of the images or the dataset.
Ethical Review
All images are sourced from publicly available Creative Commons-licensed content (Openverse, Wikimedia Commons) with per-image attribution and license preserved. The dataset may contain images in which people appear (e.g. diners, vendors, market scenes), retained intentionally as real-world context; no attempt is made to identify individuals. Known biases and limitations (source skew toward photogenic/restaurant dishes, regional and minority-cuisine imbalance within Türkiye, class imbalance, and label-granularity/reliability limits — labels are partly auto-derived and contain errors) are documented in README.md.
Intended Use
Fine-grained Turkish food classification; transfer learning and domain adaptation for food recognition; held-out evaluation of geographic/cultural bias in food models and vision-language models (dish naming, origin attribution, ingredient inference); cultural-heritage and gastronomy documentation; dietary-assessment and menu/restaurant applications.
Contents. 3,272 images (images/) and a metadata.tsv with one row per image.
Statistics.
Images: 3,272 | Distinct dish labels: 245 | Multi-label images: 380 (11%)
Sources: Wikimedia Commons 2,232, Flickr 1,038, iNaturalist 2 (via Openverse 3,181 + direct Wikimedia 91)
Licenses: CC-BY-SA 2,569 (79%), CC-BY 692 (21%), CC0/Public Domain 11
Label provenance (food_name_source): title/tags 2,326 (71%), search query 758 (23%), vision/AI-assisted 181 (6%), manual 7
Most frequent dishes: kebap (270), baklava (242), dolma (183), köfte (170), pilav (158), döner (155), simit (130), ayran (109), sarma (90), börek (84)
~1.4 MB average per image
Fields (full descriptions in README.md): filepath, filename, source (hosting platform), origin_scrape (openverse / wikimedia_commons), source_id, title, food_name (dish label(s), comma-separated for multi-dish plates), label_confidence (high/low/empty), food_name_source (how the label was derived), query_used, creator, source_url, license, license_url, flickr_tags, clarifai_tags, description, food_score (CLIP food-likelihood). Every image is fully attributable via creator + source_url + license.
Why this dataset. Food-recognition research has a documented double concentration: the large-scale datasets come from a handful of groups, and coverage is dominated by Western and East-Asian cuisines — Turkish and Middle-Eastern cuisines are sparse or absent (e.g. the community dataset World Wide Dishes contains no Turkish dish). Existing Turkish food datasets (TurkishFoods-15/-25, Turkish Food-102) are web-crawled from image search with unstated rights and cannot be cleanly redistributed or used commercially. This dataset fills the underpopulated quadrant of regional coverage AND clean licensing: to our knowledge it is the first openly-licensed Turkish food image dataset with reproducible, machine-readable per-image provenance. Food-specific pretraining has been shown to transfer substantially better than ImageNet baselines (Romero-Tapiador et al., 2024), so a clean regional corpus is useful as (a) in-domain fine-tuning data, (b) a held-out evaluation set exposing the geographic blind spots of food models and vision-language models, and (c) additional pretraining signal for an underrepresented cuisine. The multi-label design (real plates mix dishes) also makes it a testbed for intra-class and mixed-plate problems that single-label benchmarks hide.
Sources. Openverse (https://openverse.org) and Wikimedia Commons (https://commons.wikimedia.org). Methodology, sources, and full caveats are documented in README.md.
Construction, limitations & contributing. LezizNet is a semi-automatically created dataset: images were scraped and labels auto-derived, then manually reviewed and cleaned. It is not error-free — labeling errors remain, especially in auto-derived labels (food_name_source = search query or vision). The image set is also intentionally diverse: it includes Turkish food in real-world contexts (packaged products, people dining, market/serving scenes), not only isolated plated dishes, since modern vision-language models benefit from contextual imagery. This is a living dataset — corrections, additional images, and collaboration to improve it are welcome. Contact Alp Öktem (alp@oktem.me, https://alp.oktem.me/).
Please check README.md for more information.