TY - JOUR
T1 - Multiplicity and word sense
T2 - Evaluating and learning from multiply labeled word sense annotations
AU - Passonneau, Rebecca J.
AU - Bhardwaj, Vikas
AU - Salleb-Aouissi, Ansaf
AU - Ide, Nancy
N1 - Funding Information:
Acknowledgments This work was supported by NSF award CRI-0708952, including a supplement to fund co-author Vikas Bhardwaj as a Graduate Research Assistant for one semester. The authors thank the annotators for their excellent work and thoughtful comments on sense inventories. We thank Bob Carpenter for discussions about data from multiple annotators, and for his generous and insightful comments on drafts of the paper. Finally, we thank the anonymous reviewers who provided deep and thoughtful critiques, as well as very careful proofreading.
PY - 2012/6
Y1 - 2012/6
N2 - Supervised machine learning methods to model word sense often rely on human labelers to provide a single, ground truth label for each word in its context. We examine issues in establishing ground truth word sense labels using a fine-grained sense inventory from WordNet. Our data consist of a sentence corpus of 1,000 sentences: 100 for each of ten moderately polysemous words. Each word was given multiple sense labels-or a multilabel-from trained and untrained annotators. The multilabels give a nuanced representation of the degree of agreement on instances. A suite of assessment metrics is used to analyze the sets of multilabels, such as comparisons of sense distributions across annotators. Our assessment indicates that the general annotation procedure is reliable, but that words differ regarding how reliably annotators can assign WordNet sense labels, independent of the number of senses. We also investigate the performance of an unsupervised machine learning method to infer ground truth labels from various combinations of labels from the trained and untrained annotators. We find tentative support for the hypothesis that performance depends on the quality of the set of multilabels, independent of the number of labelers or their training.
AB - Supervised machine learning methods to model word sense often rely on human labelers to provide a single, ground truth label for each word in its context. We examine issues in establishing ground truth word sense labels using a fine-grained sense inventory from WordNet. Our data consist of a sentence corpus of 1,000 sentences: 100 for each of ten moderately polysemous words. Each word was given multiple sense labels-or a multilabel-from trained and untrained annotators. The multilabels give a nuanced representation of the degree of agreement on instances. A suite of assessment metrics is used to analyze the sets of multilabels, such as comparisons of sense distributions across annotators. Our assessment indicates that the general annotation procedure is reliable, but that words differ regarding how reliably annotators can assign WordNet sense labels, independent of the number of senses. We also investigate the performance of an unsupervised machine learning method to infer ground truth labels from various combinations of labels from the trained and untrained annotators. We find tentative support for the hypothesis that performance depends on the quality of the set of multilabels, independent of the number of labelers or their training.
UR - http://www.scopus.com/inward/record.url?scp=84866733052&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866733052&partnerID=8YFLogxK
U2 - 10.1007/s10579-012-9188-x
DO - 10.1007/s10579-012-9188-x
M3 - Article
AN - SCOPUS:84866733052
SN - 1574-020X
VL - 46
SP - 219
EP - 252
JO - Language Resources and Evaluation
JF - Language Resources and Evaluation
IS - 2
ER -