Statistical inference in massive data sets

Runze Li, Dennis K.J. Lin, Bing Li

Research output: Contribution to journalArticlepeer-review

73 Scopus citations

Abstract

Analysis of massive data sets is challenging owing to limitations of computer primary memory. In this paper, we propose an approach to estimate population parameters from a massive data set. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient if the entire data set was analyzed simultaneously. Asymptotic properties of the resulting estimate are studied, and the asymptotic normality of the resulting estimator is established. The standard error formula for the resulting estimate is proposed and empirically tested; thus, statistical inference for parameters of interest can be performed. The effectiveness of the proposed approach is illustrated using simulation studies and an Internet traffic data example.

Original languageEnglish (US)
Pages (from-to)399-409
Number of pages11
JournalApplied Stochastic Models in Business and Industry
Volume29
Issue number5
DOIs
StatePublished - Sep 2013

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • General Business, Management and Accounting
  • Management Science and Operations Research

Fingerprint

Dive into the research topics of 'Statistical inference in massive data sets'. Together they form a unique fingerprint.

Cite this