Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences

Mol Genet Genomics. 2016 Jun;291(3):1127-36. doi: 10.1007/s00438-016-1170-7. Epub 2016 Jan 30.

Abstract

Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.

Keywords: Human genome; Natural selection; Protein domain; Protein-coding sequence; SNPs.

MeSH terms

  • Amino Acid Sequence
  • Chromosome Mapping
  • Evolution, Molecular
  • Gene Frequency
  • Humans
  • Open Reading Frames
  • Polymorphism, Single Nucleotide*
  • Protein Domains
  • Proteins / chemistry*
  • Proteins / genetics*
  • Selection, Genetic

Substances

  • Proteins