This table is different: A wordnet-based approach to identifying references to document entities

Shomir Wilson, Alan W. Black, Jon Oberlander

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Writing intended to inform frequently contains references to document entities (DEs), a mixed class that includes orthographically structured items (e.g., illustrations, sections, lists) and discourse entities (arguments, suggestions, points). Such references are vital to the interpretation of documents, but they often eschew identifiers such as "Figure 1" for inexplicit phrases like "in this figure" or "from these premises". We examine inexplicit references to DEs, termed DE references, and recast the problem of their automatic detection into the determination of relevant word senses. We then show the feasibility of machine learning for the detection of DErelevant word senses, using a corpus of human-labeled synsets from WordNet. We test cross-domain performance by gathering lemmas and synsets from three corpora: website privacy policies, Wikipedia articles, and Wikibooks textbooks. Identifying DE references will enable language technologies to use the information encoded by them, permitting the automatic generation of finely-Tuned descriptions of DEs and the presentation of richly-structured information to readers.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th Global WordNet Conference, GWC 2016
EditorsCorina Forascu, Christiane Fellbaum, Corina Forascu, Piek Vossen, Verginica Barbu Mititelu
PublisherGlobal WordNet Association
Pages427-435
Number of pages9
ISBN (Electronic)9789730207286
StatePublished - 2016
Event8th Global WordNet Conference, GWC 2016 - Bucharest, Romania
Duration: Jan 27 2016Jan 30 2016

Publication series

NameProceedings of the 8th Global WordNet Conference, GWC 2016

Conference

Conference8th Global WordNet Conference, GWC 2016
Country/TerritoryRomania
CityBucharest
Period1/27/161/30/16

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'This table is different: A wordnet-based approach to identifying references to document entities'. Together they form a unique fingerprint.

Cite this