Genomic fossils as a snapshot of the human transcriptome

Proc Natl Acad Sci U S A. 2006 Jan 31;103(5):1364-9. doi: 10.1073/pnas.0509330103. Epub 2006 Jan 23.

Abstract

Processed pseudogenes (PPGs) are cDNA sequences that were generated through reverse transcription of mature, spliced mRNAs and have subsequently been reinserted at a new genomic location. These cDNA sequences are usually no longer transcribed and are considered "dead on arrival." Here we show that PPGs can be used to generate a map of the transcriptome. By analyzing thousands of human PPGs, we were able to discover hundreds of transcript variants so far unidentified. An experimental verification of a subset of these variants by RT-PCR indicates that most of them are still active in the human transcriptome. Furthermore, we demonstrate that PPGs can enable the identification of ancient splice variants that were expressed ancestrally but are now extinct. Our results show that the genome itself carries a "virtual cDNA library" that can readily be used to analyze both present and ancestral transcripts. Our approach can be applied to sequenced metazoan genomes to computationally annotate splicing variation even when expressed sequences are unavailable.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Alternative Splicing
  • Base Sequence
  • DNA, Complementary / metabolism
  • Exons
  • Expressed Sequence Tags
  • Fossils*
  • Gene Library
  • Genome*
  • Genome, Human*
  • Humans
  • Models, Genetic
  • Molecular Sequence Data
  • Phylogeny
  • Polymerase Chain Reaction
  • Pseudogenes
  • RNA, Messenger / metabolism*
  • Reverse Transcriptase Polymerase Chain Reaction
  • Software
  • Tissue Distribution
  • Transcription, Genetic*

Substances

  • DNA, Complementary
  • RNA, Messenger