Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers

Jan Oliver Wallgrün, Frank Hardisty, Alan M. MacEachren, Morteza Karimzadeh, Yiting Ju, Scott Pezanowski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014
EditorsRoss S. Purves, Christopher B. Jones
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450331357
DOIs
StatePublished - Nov 4 2014
Event8th Workshop on Geographic Information Retrieval, GIR 2014 - Dallas, United States
Duration: Nov 4 2014Nov 7 2014

Publication series

NameProceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014

Other

Other8th Workshop on Geographic Information Retrieval, GIR 2014
Country/TerritoryUnited States
CityDallas
Period11/4/1411/7/14

All Science Journal Classification (ASJC) codes

  • Geography, Planning and Development

Fingerprint

Dive into the research topics of 'Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers'. Together they form a unique fingerprint.

Cite this