New variable selection methods for zero-inflated count data with applications to the substance abuse field

Anne Buu, Norman J. Johnson, Runze Li, Xianming Tan

Research output: Contribution to journalArticlepeer-review

45 Scopus citations

Abstract

Zero-inflated count data are very common in health surveys. This study develops new variable selection methods for the zero-inflated Poisson regression model. Our simulations demonstrate the negative consequences which arise from the ignorance of zero-inflation. Among the competing methods, the one-step SCAD method is recommended because it has the highest specificity, sensitivity, exact fit, and lowest estimation error. The design of the simulations is based on the special features of two large national databases commonly used in the alcoholism and substance abuse field so that our findings can be easily generalized to the real settings. Applications of the methodology are demonstrated by empirical analyses on the data from a well-known alcohol study.

Original languageEnglish (US)
Pages (from-to)2326-2340
Number of pages15
JournalStatistics in Medicine
Volume30
Issue number18
DOIs
StatePublished - Aug 15 2011

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'New variable selection methods for zero-inflated count data with applications to the substance abuse field'. Together they form a unique fingerprint.

Cite this