TY - JOUR
T1 - Sure explained variability and independence screening
AU - Chen, Min
AU - Lian, Yimin
AU - Chen, Zhao
AU - Zhang, Zhengjun
N1 - Publisher Copyright:
© 2017, © American Statistical Association and Taylor & Francis 2017.
PY - 2017/10/2
Y1 - 2017/10/2
N2 - In the era of Big Data, extracting the most important exploratory variables available in ultrahigh-dimensional data plays a key role in scientific researches. Existing researches have been mainly focusing on applying the extracted exploratory variables to describe the central tendency of their related response variables. For a response variable, its variability characteristic is as much important as the central tendency in statistical inference. This paper focuses on the variability and proposes a new model-free feature screening approach: sure explained variability and independence screening (SEVIS). The core of SEVIS is to take the advantage of recently proposed asymmetric and nonlinear generalised measures of correlation in the screening. Under some mild conditions, the paper shows that SEVIS not only possesses desired sure screening property and ranking consistency property, but also is a computational convenient variable selection method to deal with ultrahigh-dimensional data sets with more features than observations. The superior performance of SEVIS, compared with existing model-free methods, is illustrated in extensive simulations. A real example in ultrahigh-dimensional variable selection demonstrates that the variables selected by SEVIS better explain not only the response variables, but also the variables selected by other methods.
AB - In the era of Big Data, extracting the most important exploratory variables available in ultrahigh-dimensional data plays a key role in scientific researches. Existing researches have been mainly focusing on applying the extracted exploratory variables to describe the central tendency of their related response variables. For a response variable, its variability characteristic is as much important as the central tendency in statistical inference. This paper focuses on the variability and proposes a new model-free feature screening approach: sure explained variability and independence screening (SEVIS). The core of SEVIS is to take the advantage of recently proposed asymmetric and nonlinear generalised measures of correlation in the screening. Under some mild conditions, the paper shows that SEVIS not only possesses desired sure screening property and ranking consistency property, but also is a computational convenient variable selection method to deal with ultrahigh-dimensional data sets with more features than observations. The superior performance of SEVIS, compared with existing model-free methods, is illustrated in extensive simulations. A real example in ultrahigh-dimensional variable selection demonstrates that the variables selected by SEVIS better explain not only the response variables, but also the variables selected by other methods.
UR - http://www.scopus.com/inward/record.url?scp=85029669846&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029669846&partnerID=8YFLogxK
U2 - 10.1080/10485252.2017.1375111
DO - 10.1080/10485252.2017.1375111
M3 - Article
AN - SCOPUS:85029669846
SN - 1048-5252
VL - 29
SP - 849
EP - 883
JO - Journal of Nonparametric Statistics
JF - Journal of Nonparametric Statistics
IS - 4
ER -