PCA in High Dimensions: An orientation

Proc IEEE Inst Electr Electron Eng. 2018 Aug;106(8):1277-1292. doi: 10.1109/JPROC.2018.2846730. Epub 2018 Jul 18.

Abstract

When the data are high dimensional, widely used multivariate statistical methods such as principal component analysis can behave in unexpected ways. In settings where the dimension of the observations is comparable to the sample size, upward bias in sample eigenvalues and inconsistency of sample eigenvectors are among the most notable phenomena that appear. These phenomena, and the limiting behavior of the rescaled extreme sample eigenvalues, have recently been investigated in detail under the spiked covariance model. The behavior of the bulk of the sample eigenvalues under weak distributional assumptions on the observations has been described. These results have been exploited to develop new estimation and hypothesis testing methods for the population covariance matrix. Furthermore, partly in response to these phenomena, alternative classes of estimation procedures have been developed by exploiting sparsity of the eigenvectors or the covariance matrix. This paper gives an orientation to these areas.

Keywords: Marchenko-Pastur distribution; Tracy-Widom law; phase transition phenomena; principal component analysis; random matrix theory; spiked covariance model.