TY - JOUR
T1 - Bandwidth selection for kernel distribution function estimation
AU - Altman, Naomi
AU - Léger, Christian
N1 - Funding Information:
Key words: Distribution function; Nonparametric estimation; Smoothing parameter selection; Cross-validation; Leave-one-out estimator * Corresponding author. 1S upported by Hatch Grant 151410 NYF, 2 Supported by NSERC (Canada) and FCAR (Quebec).
PY - 1995/8/1
Y1 - 1995/8/1
N2 - Leave-one-out cross-validation is a popular and readily implemented heuristic for bandwidth selection in nonparametric smoothing problems. In this note we elucidate the role of leave-one-out selection criteria by discussing a criterion introduced by Sarda (J. Statist. Plann. Inference 35 (1993) 65-75) for bandwidth selection for kernel distribution function estimators (KDFEs). We show that for this problem, use of the leave-one-out KDFE in the selection procedure is asymptotically equivalent to leaving none out. This contrasts with kernel density estimation, where use of the leave-one-out density estimator in the selection procedure is critical. Unfortunately, simulations show that neither method works in practice, even for samples of size as large as 1000. In fact, we show that for any fixed bandwidth, the expected value of the derivative of the leave-none-out criterion is asymptotically positive. This result and our simulations suggest that the criteria are increasing and that for sufficiently large samples (e.g., n = 100), the smallest available bandwidth will always be selected, thus contradicting the optimality result of Sarda for this estimator. As an alternative to minimizing a selection criterion, we propose a plug-in estimator of the asymptotically optimal bandwidth. Simulations suggest that the plug-in is a good estimator of the asymptotically optimal bandwidth even for samples as small as 10 observations and is not too far from the finite sample bandwidth.
AB - Leave-one-out cross-validation is a popular and readily implemented heuristic for bandwidth selection in nonparametric smoothing problems. In this note we elucidate the role of leave-one-out selection criteria by discussing a criterion introduced by Sarda (J. Statist. Plann. Inference 35 (1993) 65-75) for bandwidth selection for kernel distribution function estimators (KDFEs). We show that for this problem, use of the leave-one-out KDFE in the selection procedure is asymptotically equivalent to leaving none out. This contrasts with kernel density estimation, where use of the leave-one-out density estimator in the selection procedure is critical. Unfortunately, simulations show that neither method works in practice, even for samples of size as large as 1000. In fact, we show that for any fixed bandwidth, the expected value of the derivative of the leave-none-out criterion is asymptotically positive. This result and our simulations suggest that the criteria are increasing and that for sufficiently large samples (e.g., n = 100), the smallest available bandwidth will always be selected, thus contradicting the optimality result of Sarda for this estimator. As an alternative to minimizing a selection criterion, we propose a plug-in estimator of the asymptotically optimal bandwidth. Simulations suggest that the plug-in is a good estimator of the asymptotically optimal bandwidth even for samples as small as 10 observations and is not too far from the finite sample bandwidth.
UR - http://www.scopus.com/inward/record.url?scp=0001149797&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0001149797&partnerID=8YFLogxK
U2 - 10.1016/0378-3758(94)00102-2
DO - 10.1016/0378-3758(94)00102-2
M3 - Article
AN - SCOPUS:0001149797
SN - 0378-3758
VL - 46
SP - 195
EP - 214
JO - Journal of Statistical Planning and Inference
JF - Journal of Statistical Planning and Inference
IS - 2
ER -