Less is More Corpus | Mozilla Data Collective

Description

This repository contains the dataset associated with the paper "Less is More? The Role of Demographic Author Information in Emotion Classification of Ambiguous Text". Emotion annotation is inherently subjective, often resulting in low agreement between annotators.

Specifics

Licensing

Creative Commons Attribution 4.0 International (CC-BY-4.0)

https://spdx.org/licenses/CC-BY-4.0.html

Considerations

Restrictions/Special Constraints

This data set should only be used for research purposes.

Forbidden Usage

This data should not be used for personalized author profiling.

Metadata

Emotion annotation is inherently subjective, often resulting in low agreement between annotators. This dataset supports research investigating whether providing annotators with demographic information about the text author reduces ambiguity and improves annotation consistency.

The dataset is derived from the crowd-enVENT corpus, which consists of personal event descriptions and associated emotion annotations.

📊 Dataset Description

Total texts: 250

Total annotators: 500

Source: crowd-enVENT corpus (subset with low agreement cases)