Optimal subsampling for quantile regression in big data

Haiying Wang, Yanyuan Ma

Research output: Contribution to journalArticlepeer-review

108 Scopus citations

Abstract

We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator and the other minimizes that of the original parameter estimator. The former does not depend on the densities of the responses given covariates and is easy to implement. Algorithms based on optimal subsampling probabilities are proposed and asymptotic distributions, and the asymptotic optimality of the resulting estimators are established. Furthermore, we propose an iterative subsampling procedure based on the optimal subsampling probabilities in the linearly transformed parameter estimation which has great scalability to utilize available computational resources. In addition, this procedure yields standard errors for parameter estimators without estimating the densities of the responses given the covariates. We provide numerical examples based on both simulated and real data to illustrate the proposed method.

Original languageEnglish (US)
Pages (from-to)99-112
Number of pages14
JournalBiometrika
Volume108
Issue number1
DOIs
StatePublished - Mar 1 2021

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • General Mathematics
  • Agricultural and Biological Sciences (miscellaneous)
  • General Agricultural and Biological Sciences
  • Statistics, Probability and Uncertainty
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Optimal subsampling for quantile regression in big data'. Together they form a unique fingerprint.

Cite this