TY - JOUR
T1 - Margin-maximizing feature elimination methods for linear and nonlinear kernel-based discriminant functions
AU - Aksu, Yaman
AU - Miller, David J.
AU - Kesidis, George
AU - Yang, Qing X.
N1 - Funding Information:
Manuscript received July 02, 2008; revised September 10, 2009 and January 09, 2010; accepted January 10, 2010. Date of publication February 25, 2010; date of current version April 30, 2010. This work was supported in part by National Institute of Health (NIH) under Grant R01 AG027771.
PY - 2010/5
Y1 - 2010/5
N2 - Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedomthe hyperplane's intercept and its squared 2-normwith the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
AB - Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedomthe hyperplane's intercept and its squared 2-normwith the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
UR - http://www.scopus.com/inward/record.url?scp=77951939084&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951939084&partnerID=8YFLogxK
U2 - 10.1109/TNN.2010.2041069
DO - 10.1109/TNN.2010.2041069
M3 - Article
C2 - 20194055
AN - SCOPUS:77951939084
SN - 1045-9227
VL - 21
SP - 701
EP - 717
JO - IEEE Transactions on Neural Networks
JF - IEEE Transactions on Neural Networks
IS - 5
M1 - 5419999
ER -