RUI: CRI: CI-ADDO-EN: Collaborative Research: MASC: A Community Resource For and By the People

Project: Research project

Project Details

Description

The Manually Annotated Sub-Corpus (MASC) is a shared corpus that supports research across several disciplines: linguistics, computational linguistics, psycholinguistics, sociolinguistics and machine learning. It includes a wide variety of present-day American English texts annotated for several linguistic phenomena. Because MASC provides a unique resource, considerable community momentum has grown up around it. This project builds upon this momentum to enable the corpus to grow on its own, and to address the need for additional annotations. The major activities are to : (1) provide web-based mechanisms to facilitate community contribution and use of MASC annotations; (2) develop means to more fully automate the annotation validation process; (3) extend the WordNet annotations to cover adjectives, to support research on evaluation of ?subjective? annotations and harmonization of WordNet with other resources; (3) promote use of MASC and new annotations by diverse groups, by sponsoring shared tasks that exploit the corpus? unique characteristics and supporting beta-testers of software, data, and annotations; and (4) aggressively develop an ?Open Language Data? community around MASC through workshops, tutorials, and active participation in relevant community activities. MASC provides an unparalleled resource for training and testing of tools for natural language processing, which can enable a major leap in the productivity of NLP research and ultimately impact the way people use and interact with computers. It is the first fully open, communitydriven resource in the field. All data and annotations are freely distributed in a manner that permits immediate and easy accessibility for users around the globe.
StatusFinished
Effective start/end date7/1/1112/31/14

Funding

  • National Science Foundation: $86,183.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.