Matched gene selection and committee classifier for molecular classification of heterogeneous diseases

Guoqiang Yu, Yuanjian Feng, David Jonathan Miller, Jianhua Xuan, Eric P. Hoffman, Robert Clarke, Ben Davidson, Ie Ming Shih, Yue Wang

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


Microarray gene expressions provide new opportunities for molecular classification of heterogeneous diseases. Although various reported classification schemes show impressive performance, most existing gene selection methods are suboptimal and are not well-matched to the unique charac-teristics of the multicategory classification problem. Matched design of the gene selection method and a committee classifier is needed for identifying a small set of gene markers that achieve accurate multicategory classification while being both statistically reproducible and biologically plausible. We report a simpler and yet more accurate strategy than previous works for multicategory classification of heterogeneous diseases. Our method selects the union of one-versus-everyone (OVE) phenotypic up-regulated genes (PUGs) and matches this gene selection with a one-versus-rest support vector machine (OVRSVM). Our approach provides even-handed gene resources for discriminating both neighboring and well-separated classes. Consistent with the OVRSVM structure, we evaluated the fold changes of OVE gene expressions and found that only a small number of high-ranked genes were required to achieve superior accuracy for multicategory classification. We tested the proposed PUG-OVRSVM method on six real microarray gene expression data sets (five public benchmarks and one in-house data set) and two simulation data sets, observing significantly improved performance with lower error rates, fewer marker genes, and higher performance sustainability, as compared to several widely-adopted gene selection and classification methods. The MATLAB toolbox, experiment data and supplement files are available at

Original languageEnglish (US)
Pages (from-to)2141-2167
Number of pages27
JournalJournal of Machine Learning Research
StatePublished - Aug 2010

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Matched gene selection and committee classifier for molecular classification of heterogeneous diseases'. Together they form a unique fingerprint.

Cite this