The Manually Annotated Sub-Corpus: A community resource for and by the people

Nancy Ide, Collin Baker, Christiane Fellbaum, Rebecca Jane Passonneau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

77 Scopus citations

Abstract

The Manually Annotated Sub-Corpus (MASC) project provides data and annotations to serve as the base for a communitywide annotation effort of a subset of the American National Corpus. The MASC infrastructure enables the incorporation of contributed annotations into a single, usable format that can then be analyzed as it is or ported to any of a variety of other formats. MASC includes data from a much wider variety of genres than existing multiply-annotated corpora of English, and the project is committed to a fully open model of distribution, without restriction, for all data and annotations produced or contributed. As such, MASC is the first large-scale, open, community-based effort to create much needed language resources for NLP. This paper describes the MASC project, its corpus and annotations, and serves as a call for contributions of data and annotations from the language processing community.

Original languageEnglish (US)
Title of host publicationACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages68-73
Number of pages6
StatePublished - Dec 1 2010
Event48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden
Duration: Jul 11 2010Jul 16 2010

Publication series

NameACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Other

Other48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Country/TerritorySweden
CityUppsala
Period7/11/107/16/10

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'The Manually Annotated Sub-Corpus: A community resource for and by the people'. Together they form a unique fingerprint.

Cite this