The Protein-Coding Human Genome: Annotating High-Hanging Fruits

Bioessays. 2019 Nov;41(11):e1900066. doi: 10.1002/bies.201900066. Epub 2019 Sep 23.

Abstract

The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.

Keywords: alternative splicing; human genome annotation; human pan-genome; micro-exon; mutually exclusive exons; protein-coding genes; wobble splicing.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Alternative Splicing / genetics
  • Animals
  • Exons / genetics
  • Genome, Human / genetics*
  • Genomics / methods
  • Humans
  • Proteins / genetics*
  • RNA Splicing / genetics
  • Transcriptome / genetics

Substances

  • Proteins