Task: CV
Release Date: 4/6/2026
Format: JSON, JPEG
Size: 281.05 MB
Share
This dataset provides 116 images of varying quantities of a range of different objects, each with an accompanying question about the quantity in the image and a numeric answer. For example, an image of a table with multiple fruits and vegetables, with the question "How many carrots?". Other examples of objects appearing in this dataset: rocks, brooms, clothes pins, money, motorcycles, screws, paperclips.
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlRestrictions/Special Constraints
N/A
Forbidden Usage
N/A
Intended Use
To evaluate/benchmark multimodal vision models on their ability to distinguish, count, and reason about the quantities of common objects.
This dataset provides 116 images of varying quantities of a range of different objects, each with an accompanying question about the quantity in the image and a numeric answer. For example, an image of a table with multiple fruits and vegetables, with the question "How many carrots?". Other examples of objects appearing in this dataset: rocks, brooms, clothes pins, money, motorcycles, screws, paperclips.
This data is intended to be used to evaluate CV multimodal models on their ability to identify specific objects and keep track of quantities. Many cases simply involve counting, whereas some also require reading (e.g., a bag of some product with the number of items inside).
The data was collected by taking photos with mobile phones (Phone and ZTE Blade) of varying quantities of different household items and labeling them with the different quantities.