Inferring censored geo-information with non-representative data

Yu Zhang, Tse Chuan Yang, Stephen A. Matthews

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


The goal of this study is to develop a method that is capable of inferring geo-locations for non-representative data. In order to protect privacy of surveyed individuals, most data collectors release coarse geo-information (e.g., tract), rather than detailed geo-information (e.g., street, apt number) when sharing surveyed data. Without the exact locations, many point-based analyses cannot be performed. While several scholars have developed new methods to address this issue, little attention has been paid to how to correct this issue when data are not representative. To fill this knowledge gap, we propose a bias correction method that adjusts for the bias using a bias factor approach. Applying our method to an empirical data set with a known bias associated with gender, we found that our method could generate reliable results despite the non-representativeness of the sample.

Original languageEnglish (US)
Title of host publicationMachine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings
EditorsPetra Perner
PublisherSpringer Verlag
Number of pages7
ISBN (Print)9783319419190
StatePublished - 2016
Event12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016 - New York, United States
Duration: Jul 16 2016Jul 21 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016
Country/TerritoryUnited States
CityNew York

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Inferring censored geo-information with non-representative data'. Together they form a unique fingerprint.

Cite this