Estimation of discourse segmentation labels from crowd data

Ziheng Huang, Jialu Zhong, Rebecca J. Passonneau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations where we assume judgments are not independent, based on the observation that an annotator's segments all tend to be several utterances long. The data consists of crowd labels for annotation of discourse segment boundaries. The new methods extend Hidden Markov Models to relax the independence assumption. The two methods are distinct, so positive labels proposed by both are taken to be ground truth. In addition, results of the models are checked using metrics that test whether an annotator's accuracy relative to a given model remains consistent across different conversations.

Original languageEnglish (US)
Title of host publicationConference Proceedings - EMNLP 2015
Subtitle of host publicationConference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics (ACL)
Pages2190-2200
Number of pages11
ISBN (Electronic)9781941643327
DOIs
StatePublished - 2015
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Lisbon, Portugal
Duration: Sep 17 2015Sep 21 2015

Publication series

NameConference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2015
Country/TerritoryPortugal
CityLisbon
Period9/17/159/21/15

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Estimation of discourse segmentation labels from crowd data'. Together they form a unique fingerprint.

Cite this