TY - GEN
T1 - This table is different
T2 - 8th Global WordNet Conference, GWC 2016
AU - Wilson, Shomir
AU - Black, Alan W.
AU - Oberlander, Jon
N1 - Funding Information:
This research was supported in part by the National Science Foundation under grants OISE 11-59236 (Metalanguage Identification for Interactive Language Technologies) and CNS 13-30596 (Towards Effective Web Privacy Notice & Choice: A Multi-Disciplinary Perspective).
PY - 2016
Y1 - 2016
N2 - Writing intended to inform frequently contains references to document entities (DEs), a mixed class that includes orthographically structured items (e.g., illustrations, sections, lists) and discourse entities (arguments, suggestions, points). Such references are vital to the interpretation of documents, but they often eschew identifiers such as "Figure 1" for inexplicit phrases like "in this figure" or "from these premises". We examine inexplicit references to DEs, termed DE references, and recast the problem of their automatic detection into the determination of relevant word senses. We then show the feasibility of machine learning for the detection of DErelevant word senses, using a corpus of human-labeled synsets from WordNet. We test cross-domain performance by gathering lemmas and synsets from three corpora: website privacy policies, Wikipedia articles, and Wikibooks textbooks. Identifying DE references will enable language technologies to use the information encoded by them, permitting the automatic generation of finely-Tuned descriptions of DEs and the presentation of richly-structured information to readers.
AB - Writing intended to inform frequently contains references to document entities (DEs), a mixed class that includes orthographically structured items (e.g., illustrations, sections, lists) and discourse entities (arguments, suggestions, points). Such references are vital to the interpretation of documents, but they often eschew identifiers such as "Figure 1" for inexplicit phrases like "in this figure" or "from these premises". We examine inexplicit references to DEs, termed DE references, and recast the problem of their automatic detection into the determination of relevant word senses. We then show the feasibility of machine learning for the detection of DErelevant word senses, using a corpus of human-labeled synsets from WordNet. We test cross-domain performance by gathering lemmas and synsets from three corpora: website privacy policies, Wikipedia articles, and Wikibooks textbooks. Identifying DE references will enable language technologies to use the information encoded by them, permitting the automatic generation of finely-Tuned descriptions of DEs and the presentation of richly-structured information to readers.
UR - http://www.scopus.com/inward/record.url?scp=84962833998&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962833998&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84962833998
T3 - Proceedings of the 8th Global WordNet Conference, GWC 2016
SP - 427
EP - 435
BT - Proceedings of the 8th Global WordNet Conference, GWC 2016
A2 - Forascu, Corina
A2 - Fellbaum, Christiane
A2 - Forascu, Corina
A2 - Vossen, Piek
A2 - Mititelu, Verginica Barbu
PB - Global WordNet Association
Y2 - 27 January 2016 through 30 January 2016
ER -