TY - JOUR
T1 - Variable selection via additive conditional independence
AU - Lee, Kuang Yao
AU - Li, Bing
AU - Zhao, Hongyu
N1 - Funding Information:
We are grateful to the Associate Editor and two referees for their constructive comments and helpful suggestions. We also thank Dr Xiting Yan for her support on the enrichment analysis. Our research was supported in part by National Science Foundation grants DMS-1106815 and DMS-1407537 awarded to Bing Li, and National Science Foundation grant DMS-1106738 and National Institutes of Health grants R01 GM59507 and P01 CA154295 awarded to Hongyu Zhao.
Publisher Copyright:
© 2016 Royal Statistical Society
PY - 2016/11/1
Y1 - 2016/11/1
N2 - We propose a non-parametric variable selection method which does not rely on any regression model or predictor distribution. The method is based on a new statistical relationship, called additive conditional independence, that has been introduced recently for graphical models. Unlike most existing variable selection methods, which target the mean of the response, the method proposed targets a set of attributes of the response, such as its mean, variance or entire distribution. In addition, the additive nature of this approach offers non-parametric flexibility without employing multi-dimensional kernels. As a result it retains high accuracy for high dimensional predictors. We establish estimation consistency, convergence rate and variable selection consistency of the method proposed. Through simulation comparisons we demonstrate that the method proposed performs better than existing methods when the predictor affects several attributes of the response, and it performs competently in the classical setting where the predictors affect the mean only. We apply the new method to a data set concerning how gene expression levels affect the weight of mice.
AB - We propose a non-parametric variable selection method which does not rely on any regression model or predictor distribution. The method is based on a new statistical relationship, called additive conditional independence, that has been introduced recently for graphical models. Unlike most existing variable selection methods, which target the mean of the response, the method proposed targets a set of attributes of the response, such as its mean, variance or entire distribution. In addition, the additive nature of this approach offers non-parametric flexibility without employing multi-dimensional kernels. As a result it retains high accuracy for high dimensional predictors. We establish estimation consistency, convergence rate and variable selection consistency of the method proposed. Through simulation comparisons we demonstrate that the method proposed performs better than existing methods when the predictor affects several attributes of the response, and it performs competently in the classical setting where the predictors affect the mean only. We apply the new method to a data set concerning how gene expression levels affect the weight of mice.
UR - http://www.scopus.com/inward/record.url?scp=84958817789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958817789&partnerID=8YFLogxK
U2 - 10.1111/rssb.12150
DO - 10.1111/rssb.12150
M3 - Article
AN - SCOPUS:84958817789
SN - 1369-7412
VL - 78
SP - 1037
EP - 1055
JO - Journal of the Royal Statistical Society. Series B: Statistical Methodology
JF - Journal of the Royal Statistical Society. Series B: Statistical Methodology
IS - 5
ER -