On principal components and regression: A statistical explanation of a natural phenomenon

Andreas Artemiou, Bing Li

Research output: Contribution to journalArticlepeer-review

26 Scopus citations


In this note we give a probabilistic explanation of a phenomenon that is frequently observed but whose reason is not well understood. That is, in a regression setting, the response (Y) is often highly correlated with the leading principal components of the predictor (X) even though there seems no logical reason for this connection. This phenomenon has long been noticed and discussed in the literature, and has received renewed interest recently because of the need for regressing Y on X of very high dimension, often with comparatively few sampling units, in which case it seems natural to regress on the first few principal components of X. This work stems from a discussion of a recent paper by Cook (2007) which, along with other developments, described a historical debate surrounding, and current interest in, this phenomenon.

Original languageEnglish (US)
Pages (from-to)1557-1565
Number of pages9
JournalStatistica Sinica
Issue number4
StatePublished - Oct 1 2009

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'On principal components and regression: A statistical explanation of a natural phenomenon'. Together they form a unique fingerprint.

Cite this