Kanuri Books Corpus

Description

A text corpus of 10,281 randomized sentences (90,706 words) extracted from books by Kanuri authors Dr. Baba Kura Alkali Gazali, Lawan Dalama, Kaka Gana Abba, and Lawan Hassan. The corpus includes both original and normalized (lowercased, punctuation-removed) versions. It was compiled by CLEAR Global (formerly Translators without Borders) for the creation of open-source language technology. These sentences were also recorded by multiple speakers to make a speech corpus published within TWB Voice.

Specifics

Licensing

Creative Commons Attribution 4.0 International (CC-BY-4.0)

https://spdx.org/licenses/CC-BY-4.0.html

Considerations

Restrictions/Special Constraints

Attribution to CLEAR Global and the authors is required.

Forbidden Usage

Creating harmful, threatening, defamatory, or deceptive content. Victimizing or intimidating individuals or groups. Harming minors. Any use contrary to CLEAR Global's humanitarian mission. Violating applicable law.

Description

Specifics

Considerations

Processes

Metadata