Relation between agreement measures on human labeling and machine learning performance: Results from an art history domain

Rebecca J. Passonneau, Tom Lippincott, Tae Yano, Judith Klavans

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

We discuss factors that affect human agreement on a semantic labeling task in the art history domain, based on the results of four experiments where we varied the number of labels annotators could assign, the number of annotators, the type and amount of training they received, and the size of the text span being labeled. Using the labelings from one experiment involving seven annotators, we investigate the relation between interannotator agreement and machine learning performance. We construct binary classifiers and vary the training and test data by swapping the labelings from the seven annotators. First, we find performance is often quite good despite lower than recommended interannotator agreement. Second, we find that on average, learning performance for a given functional semantic category correlates with the overall agreement among the seven annotators for that category. Third, we find that learning performance on the data from a given annotator does not correlate with the quality of that annotator's labeling. We offer recommendations for the use of labeled data in machine learning, and argue that learners should attempt to accommodate human variation. We also note implications for large scale corpus annotation projects that deal with similarly subjective phenomena.

Original languageEnglish (US)
Title of host publicationProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PublisherEuropean Language Resources Association (ELRA)
Pages2841-2848
Number of pages8
ISBN (Electronic)2951740840, 9782951740846
StatePublished - 2008
Event6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco
Duration: May 28 2008May 30 2008

Publication series

NameProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

Other

Other6th International Conference on Language Resources and Evaluation, LREC 2008
Country/TerritoryMorocco
CityMarrakech
Period5/28/085/30/08

All Science Journal Classification (ASJC) codes

  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics
  • Education

Fingerprint

Dive into the research topics of 'Relation between agreement measures on human labeling and machine learning performance: Results from an art history domain'. Together they form a unique fingerprint.

Cite this