Prediction-Based Structured Variable Selection through the Receiver Operating Characteristic Curves

Yuanjia Wang, Huaihou Chen, Runze Li, Naihua Duan, Roberto Lewis-Fernández

Research output: Contribution to journalArticlepeer-review

15 Scopus citations


In many clinical settings, a commonly encountered problem is to assess accuracy of a screening test for early detection of a disease. In these applications, predictive performance of the test is of interest. Variable selection may be useful in designing a medical test. An example is a research study conducted to design a new screening test by selecting variables from an existing screener with a hierarchical structure among variables: there are several root questions followed by their stem questions. The stem questions will only be asked after a subject has answered the root question. It is therefore unreasonable to select a model that only contains stem variables but not its root variable. In this work, we propose methods to perform variable selection with structured variables when predictive accuracy of a diagnostic test is the main concern of the analysis. We take a linear combination of individual variables to form a combined test. We then maximize a direct summary measure of the predictive performance of the test, the area under a receiver operating characteristic curve (AUC of an ROC), subject to a penalty function to control for overfitting. Since maximizing empirical AUC of the ROC of a combined test is a complicated nonconvex problem (Pepe, Cai, and Longton, 2006, Biometrics62, 221-229), we explore the connection between the empirical AUC and a support vector machine (SVM). We cast the problem of maximizing predictive performance of a combined test as a penalized SVM problem and apply a reparametrization to impose the hierarchical structure among variables. We also describe a penalized logistic regression variable selection procedure for structured variables and compare it with the ROC-based approaches. We use simulation studies based on real data to examine performance of the proposed methods. Finally we apply developed methods to design a structured screener to be used in primary care clinics to refer potentially psychotic patients for further specialty diagnostics and treatment.

Original languageEnglish (US)
Pages (from-to)896-905
Number of pages10
Issue number3
StatePublished - Sep 2011

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics


Dive into the research topics of 'Prediction-Based Structured Variable Selection through the Receiver Operating Characteristic Curves'. Together they form a unique fingerprint.

Cite this