RUI: CRI: CI-ADDO-EN: Collaborative Research: MASC: A Community Resource For and By the People

Project: Research project

Project Details

Description

The Manually Annotated Sub-Corpus (MASC) is a shared corpus that supports research across

several disciplines: linguistics, computational linguistics, psycholinguistics, sociolinguistics and

machine learning. It includes a wide variety of present-day American English texts annotated for

several linguistic phenomena. Because MASC provides a unique resource, considerable

community momentum has grown up around it. This project builds upon this momentum to

enable the corpus to grow on its own, and to address the need for additional annotations. The

major activities are to : (1) provide web-based mechanisms to facilitate community contribution

and use of MASC annotations; (2) develop means to more fully automate the annotation

validation process; (3) extend the WordNet annotations to cover adjectives, to support research

on evaluation of ?subjective? annotations and harmonization of WordNet with other resources;

(3) promote use of MASC and new annotations by diverse groups, by sponsoring shared tasks

that exploit the corpus? unique characteristics and supporting beta-testers of software, data, and

annotations; and (4) aggressively develop an ?Open Language Data? community around MASC

through workshops, tutorials, and active participation in relevant community activities.

MASC provides an unparalleled resource for training and testing of tools for natural language

processing, which can enable a major leap in the productivity of NLP research and ultimately

impact the way people use and interact with computers. It is the first fully open, communitydriven

resource in the field. All data and annotations are freely distributed in a manner that

permits immediate and easy accessibility for users around the globe.

StatusFinished
Effective start/end date7/1/1112/31/14

Funding

  • National Science Foundation: $86,183.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.