Speech Corpus of English Learners from Mexico

This dataset contains read speech recordings from 8 English language learners from Mexico.

Data

Format

The dataset contains a tsv file, metadata.tsv, with the following columns:

audio_id: a key with speaker_id-audio_id
speaker_id
audio_filename
sentence: text
num attempts: Speakers were asked to read the sentence as fluidly as possible, and encouraged to do retakes if they struggled during a reading. This column shows how many attempts were taken to record the sentence.

Source text

The source text consists of 1,000 sentences taken from this multilingual readability corpus. The sentences are from OpenSubtitles, and are between 1 and 10 words long.

Speakers

The 8 speakers are all L2 English learners living in Mexico. All but one speak only Spanish natively (the remaining speaker is a native bilingual of Nahuatl and Spanish). Their ages are between 18-40.

Questionnaires

Some of the speakers answered a short survey about their language experience. The answers are stored as text files in the speaker_questionnaires directory.

Speech Corpus of English Learners from Mexico

Description

Specifics

Considerations

Processes

Metadata

Speech Corpus of English Learners from Mexico

Data

Format

Source text

Speakers

Questionnaires