Measuring agreement on set-valued items (MASI) for semantic and pragmatic annotation

Research output: Contribution to conferencePaperpeer-review

120 Scopus citations

Abstract

Annotation projects dealing with complex semantic or pragmatic phenomena face the dilemma of creating annotation schemes that oversimplify the phenomena, or that capture distinctions conventional reliability metrics cannot measure adequately. The solution to the dilemma is to develop metrics that quantify the decisions that annotators are asked to make. This paper discusses MASI, distance metric for comparing sets, and illustrates its use in quantifying the reliability of a specific dataset. Annotations of Summary Content Units (SCUs) generate models referred to as pyramids which can be used to evaluate unseen human summaries or machine summaries. The paper presents reliability results for five pairs of pyramids created for document sets from the 2003 Document Understanding Conference (DUC). The annotators worked independently of each other. Differences between application of MASI to pyramid annotation and its previous application to co-reference annotation are discussed. In addition, it is argued that a paradigmatic reliability study should relate measures of inter-annotator agreement to independent assessments, such as significance tests of the annotated variables with respect to other phenomena. In effect, what counts as sufficiently reliable intera-annotator agreement depends on the use the annotated data will be put to.

Original languageEnglish (US)
Pages831-836
Number of pages6
StatePublished - 2006
Event5th International Conference on Language Resources and Evaluation, LREC 2006 - Genoa, Italy
Duration: May 22 2006May 28 2006

Other

Other5th International Conference on Language Resources and Evaluation, LREC 2006
Country/TerritoryItaly
CityGenoa
Period5/22/065/28/06

All Science Journal Classification (ASJC) codes

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Measuring agreement on set-valued items (MASI) for semantic and pragmatic annotation'. Together they form a unique fingerprint.

Cite this