New Genomic Signals Underlying the Emergence of Human Proto-Genes

Genes (Basel). 2022 Jan 31;13(2):284. doi: 10.3390/genes13020284.

Abstract

De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes' properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5' Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5' UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5' UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.

Keywords: 5′ UTRs; human proto-genes; introns; protein domains; regulatory motifs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 5' Untranslated Regions
  • Exons / genetics
  • Genomics*
  • Humans
  • Introns / genetics
  • Promoter Regions, Genetic

Substances

  • 5' Untranslated Regions