TY - JOUR
T1 - Model-Free Feature Screening and FDR Control With Knockoff Features
AU - Liu, Wanjun
AU - Ke, Yuan
AU - Liu, Jingyuan
AU - Li, Runze
N1 - Publisher Copyright:
© 2020 American Statistical Association.
PY - 2022
Y1 - 2022
N2 - This article proposes a model-free and data-adaptive feature screening method for ultrahigh-dimensional data. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This projection correlation based method does not require specifying a regression model, and applies to data in the presence of heavy tails and multivariate responses. It enjoys both sure screening and rank consistency properties under weak assumptions. A two-step approach, with the help of knockoff features, is advocated to specify the threshold for feature screening such that the false discovery rate (FDR) is controlled under a prespecified level. The proposed two-step approach enjoys both sure screening and FDR control simultaneously if the prespecified FDR level is greater or equal to 1/s, where s is the number of active features. The superior empirical performance of the proposed method is illustrated by simulation examples and real data applications. Supplementary materials for this article are available online.
AB - This article proposes a model-free and data-adaptive feature screening method for ultrahigh-dimensional data. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This projection correlation based method does not require specifying a regression model, and applies to data in the presence of heavy tails and multivariate responses. It enjoys both sure screening and rank consistency properties under weak assumptions. A two-step approach, with the help of knockoff features, is advocated to specify the threshold for feature screening such that the false discovery rate (FDR) is controlled under a prespecified level. The proposed two-step approach enjoys both sure screening and FDR control simultaneously if the prespecified FDR level is greater or equal to 1/s, where s is the number of active features. The superior empirical performance of the proposed method is illustrated by simulation examples and real data applications. Supplementary materials for this article are available online.
UR - http://www.scopus.com/inward/record.url?scp=85088251602&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088251602&partnerID=8YFLogxK
U2 - 10.1080/01621459.2020.1783274
DO - 10.1080/01621459.2020.1783274
M3 - Article
AN - SCOPUS:85088251602
SN - 0162-1459
VL - 117
SP - 428
EP - 443
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 537
ER -