Statistical challenges of high-dimensional data

Iain M Johnstone; D Michael Titterington

doi:10.1098/rsta.2009.0159

Statistical challenges of high-dimensional data

Philos Trans A Math Phys Eng Sci. 2009 Nov 13;367(1906):4237-53. doi: 10.1098/rsta.2009.0159.

Authors

Iain M Johnstone¹, D Michael Titterington

Affiliation

¹ Department of Statistics, Stanford University, Stanford, CA 94305, USA.

Abstract

Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue.

Statistical challenges of high-dimensional data

Authors

Affiliation

Abstract

Publication types

MeSH terms

Grants and funding