ViQua² — Visual Question-answering about Quantities

ViQuA²: Visual Question Answering about Quantities: An evaluation dataset for visual reasoning about quantities.

This dataset provides 116 images of varying quantities of a range of different objects, each with an accompanying question about the quantity in the image and a numeric answer. For example, an image of a table with multiple fruits and vegetables, with the question "How many carrots?". Other examples of objects appearing in this dataset: rocks, brooms, clothes pins, money, motorcycles, screws, paperclips.

Overview

This data is intended to be used to evaluate CV multimodal models on their ability to identify specific objects and keep track of quantities. Many cases simply involve counting, whereas some also require reading (e.g., a bag of some product with the number of items inside).

Data Collection

The data was collected by taking photos with mobile phones (Phone and ZTE Blade) of varying quantities of different household items and labeling them with the different quantities.

ViQua² — Visual Question-answering about Quantities

Description

Specifics

Considerations

Processes

Metadata

ViQuA²: Visual Question Answering about Quantities: An evaluation dataset for visual reasoning about quantities.

Overview

Data Collection