Principal components analysis

Methods Mol Biol. 2013:930:527-47. doi: 10.1007/978-1-62703-059-5_22.

Abstract

Principal components analysis (PCA) is a standard tool in multivariate data analysis to reduce the number of dimensions, while retaining as much as possible of the data's variation. Instead of investigating thousands of original variables, the first few components containing the majority of the data's variation are explored. The visualization and statistical analysis of these new variables, the principal components, can help to find similarities and differences between samples. Important original variables that are the major contributors to the first few components can be discovered as well.This chapter seeks to deliver a conceptual understanding of PCA as well as a mathematical description. We describe how PCA can be used to analyze different datasets, and we include practical code examples. Possible shortcomings of the methodology and ways to overcome these problems are also discussed.

MeSH terms

  • Body Height
  • Body Weight
  • Codon / genetics
  • Escherichia coli / metabolism
  • Humans
  • Metabolome
  • Principal Component Analysis*
  • Sequence Analysis, DNA
  • Statistics as Topic
  • Students
  • Time Factors

Substances

  • Codon