大规模数据分析中基于外推的调节参数选取

Translated title of the contribution: Extrapolation-based tuning parameters selection in massive data analysis

Haojie Ren, Changliang Zou, Runze Li

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Many statistical modeling procedures involve one or more tuning parameters to control the model complexity. These tuning parameters can be the bandwidth in the kernel smoothing method in the nonparametric regression and density estimation or be the regularization parameter in the regularization method for feature selection in the high dimensional modeling. Tuning parameter selection plays critical roles in the statistical modeling and machine learning. For the massive data analysis, commonly-used methods such as grid-point search with information criteria become prohibitively costly in computation. Their feasibility is questionable even with modern parallel computing platforms. This paper aims to develop a fast algorithm to efficiently approximate the best tuning parameters. The algorithm entails (a) assuming a parametric model to describe the trend between the best tuning parameters and sample sizes, (b) establishing the trend via fitting the model with subsampling data, and (c) extrapolating this trend to the case of huge sample size. To determine the subsampling sample sizes to be taken, we derive optimal designs for settings that allow a constraint on the budget of total computational cost. We show that the proposed designs possess an asymptotic optimality property. Our numerical studies demonstrate that with a simple two-parameter polynomial model, the proposed algorithm performs almost equivalently to the procedure using the full data set in several different statistical settings, while it has a significant reduction in computing time and storage.

Translated title of the contributionExtrapolation-based tuning parameters selection in massive data analysis
Original languageChinese (Traditional)
Pages (from-to)689-708
Number of pages20
JournalScientia Sinica Mathematica
Volume52
Issue number6
DOIs
StatePublished - Jun 2022

All Science Journal Classification (ASJC) codes

  • General Mathematics

Fingerprint

Dive into the research topics of 'Extrapolation-based tuning parameters selection in massive data analysis'. Together they form a unique fingerprint.

Cite this