Project Details
Description
The Manually Annotated Sub-Corpus (MASC) is a shared corpus that supports research across
several disciplines: linguistics, computational linguistics, psycholinguistics, sociolinguistics and
machine learning. It includes a wide variety of present-day American English texts annotated for
several linguistic phenomena. Because MASC provides a unique resource, considerable
community momentum has grown up around it. This project builds upon this momentum to
enable the corpus to grow on its own, and to address the need for additional annotations. The
major activities are to : (1) provide web-based mechanisms to facilitate community contribution
and use of MASC annotations; (2) develop means to more fully automate the annotation
validation process; (3) extend the WordNet annotations to cover adjectives, to support research
on evaluation of ?subjective? annotations and harmonization of WordNet with other resources;
(3) promote use of MASC and new annotations by diverse groups, by sponsoring shared tasks
that exploit the corpus? unique characteristics and supporting beta-testers of software, data, and
annotations; and (4) aggressively develop an ?Open Language Data? community around MASC
through workshops, tutorials, and active participation in relevant community activities.
MASC provides an unparalleled resource for training and testing of tools for natural language
processing, which can enable a major leap in the productivity of NLP research and ultimately
impact the way people use and interact with computers. It is the first fully open, communitydriven
resource in the field. All data and annotations are freely distributed in a manner that
permits immediate and easy accessibility for users around the globe.
Status | Finished |
---|---|
Effective start/end date | 7/1/11 → 12/31/14 |
Funding
- National Science Foundation: $86,183.00