The evolution of human cells in terms of protein innovation

Mol Biol Evol. 2014 Jun;31(6):1364-74. doi: 10.1093/molbev/mst139. Epub 2014 Apr 1.

Abstract

Humans are composed of hundreds of cell types. As the genomic DNA of each somatic cell is identical, cell type is determined by what is expressed and when. Until recently, little has been reported about the determinants of human cell identity, particularly from the joint perspective of gene evolution and expression. Here, we chart the evolutionary past of all documented human cell types via the collective histories of proteins, the principal product of gene expression. FANTOM5 data provide cell-type-specific digital expression of human protein-coding genes and the SUPERFAMILY resource is used to provide protein domain annotation. The evolutionary epoch in which each protein was created is inferred by comparison with domain annotation of all other completely sequenced genomes. Studying the distribution across epochs of genes expressed in each cell type reveals insights into human cellular evolution in terms of protein innovation. For each cell type, its history of protein innovation is charted based on the genes it expresses. Combining the histories of all cell types enables us to create a timeline of cell evolution. This timeline identifies the possibility that our common ancestor Coelomata (cavity-forming animals) provided the innovation required for the innate immune system, whereas cells which now form the brain of human have followed a trajectory of continually accumulating novel proteins since Opisthokonta (boundary of animals and fungi). We conclude that exaptation of existing domain architectures into new contexts is the dominant source of cell-type-specific domain architectures.

Keywords: CAGE; evolution; protein domains; transcriptome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Eukaryotic Cells
  • Evolution, Molecular*
  • Humans
  • Immunity, Innate
  • Phylogeny*
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics*
  • Sequence Analysis, Protein
  • Transcriptome

Substances

  • Proteins