TY - GEN
T1 - Disambiguating road names in text route descriptions using Exact-All-Hop Shortest Path algorithm
AU - Zhang, Xiao
AU - Qiu, Baojun
AU - Mitra, Prasenjit
AU - Xu, Sen
AU - Klippel, Alexander
AU - MacEachren, Alan M.
PY - 2012
Y1 - 2012
N2 - Automatic extraction and understanding of human-generated route descriptions have been critical to research aiming at understanding human cognition of geospatial information. Among all research issues involved, road name disambiguation is the most important, because one road name can refer to more than one road. Compared with traditional toponym (place name) disambiguation, the challenges of disambiguating road names in human-generated route description are three-fold: (1) the authors may use a wrong or obsolete road name and the gazetteer may have incomplete or out-of-date information; (2) geographic ontologies often used to disambiguate cities or counties do not exist for roads, due to their linear nature and large spatial extent; (3) knowledge of the co-occurrence of road names and other toponyms are difficult to learn due to the difficulty in automatic processing of natural language and lack of external information source of road entities. In this paper, we solve the problem of road name disambiguation in human-generated route descriptions with noise, i.e. in the presence of wrong names and incomplete gazetteer. We model the problem as an Exact-All-Hop Shortest Path problem on a semi-complete directed k-partite graph, and design an efficient algorithm to solve it. Our disambiguation algorithm successfully handles the noisy data and does not require any extra information sources other than the gazetteer. We compared our algorithm with an existing map-based method. Experiment results show that our algorithm significantly outperforms the existing method.
AB - Automatic extraction and understanding of human-generated route descriptions have been critical to research aiming at understanding human cognition of geospatial information. Among all research issues involved, road name disambiguation is the most important, because one road name can refer to more than one road. Compared with traditional toponym (place name) disambiguation, the challenges of disambiguating road names in human-generated route description are three-fold: (1) the authors may use a wrong or obsolete road name and the gazetteer may have incomplete or out-of-date information; (2) geographic ontologies often used to disambiguate cities or counties do not exist for roads, due to their linear nature and large spatial extent; (3) knowledge of the co-occurrence of road names and other toponyms are difficult to learn due to the difficulty in automatic processing of natural language and lack of external information source of road entities. In this paper, we solve the problem of road name disambiguation in human-generated route descriptions with noise, i.e. in the presence of wrong names and incomplete gazetteer. We model the problem as an Exact-All-Hop Shortest Path problem on a semi-complete directed k-partite graph, and design an efficient algorithm to solve it. Our disambiguation algorithm successfully handles the noisy data and does not require any extra information sources other than the gazetteer. We compared our algorithm with an existing map-based method. Experiment results show that our algorithm significantly outperforms the existing method.
UR - http://www.scopus.com/inward/record.url?scp=84878805443&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878805443&partnerID=8YFLogxK
U2 - 10.3233/978-1-61499-098-7-876
DO - 10.3233/978-1-61499-098-7-876
M3 - Conference contribution
AN - SCOPUS:84878805443
SN - 9781614990970
T3 - Frontiers in Artificial Intelligence and Applications
SP - 876
EP - 881
BT - ECAI 2012 - 20th European Conference on Artificial Intelligence, 27-31 August 2012, Montpellier, France - Including Prestigious Applications of Artificial Intelligence (PAIS-2012) System Demonstration
PB - IOS Press BV
T2 - 20th European Conference on Artificial Intelligence, ECAI 2012
Y2 - 27 August 2012 through 31 August 2012
ER -