A visual analytics approach to high-dimensional logistic regression modeling and its application to an environmental health study

Chong Zhang, Jing Yang, F. Benjamin Zhan, Xi Gong, Jean D. Brender, Peter H. Langlois, Scott Barlowe, Ye Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

In the domain of epidemiology, logistic regression modeling is widely used to explain the relationships among explanatory variables and dichotomous outcome variables. However, logistic regression modeling faces challenges such as overfitting, confounding, and multicollinearity when there is a large number of explanatory variables. For example, in the birth defect study presented in this paper, variable selection for building high quality models to identify risk factors from hundreds of pollutant variables is difficult. To address this problem, we propose a novel visual analytics approach to logistic regression modeling for high-dimensional datasets. It leverages the traditional modeling pipeline by providing (1) intuitive visualizations for inspecting statistical indicators and the relationships among the variables and (2) a seamless, effective dimension reduction pipeline for selecting variables for inclusion in high quality logistic regression models. A fully working prototype of this approach has been developed and successfully applied to the birth defect study, which illustrates its effectiveness and efficiency. Its application in an insurance policy study and feedback from domain experts further demonstrate its usefulness.

Original languageEnglish (US)
Title of host publication2016 IEEE Pacific Visualization Symposium, PacificVis 2016 - Proceedings
EditorsChuck Hansen, Ivan Viola, Xiaoru Yuan
PublisherIEEE Computer Society
Pages136-143
Number of pages8
ISBN (Electronic)9781509014514
DOIs
StatePublished - May 4 2016
Event9th IEEE Pacific Visualization Symposium, PacificVis 2016 - Taipei, Taiwan, Province of China
Duration: Apr 19 2016Apr 22 2016

Publication series

NameIEEE Pacific Visualization Symposium
Volume2016-May
ISSN (Print)2165-8765
ISSN (Electronic)2165-8773

Conference

Conference9th IEEE Pacific Visualization Symposium, PacificVis 2016
Country/TerritoryTaiwan, Province of China
CityTaipei
Period4/19/164/22/16

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'A visual analytics approach to high-dimensional logistic regression modeling and its application to an environmental health study'. Together they form a unique fingerprint.

Cite this