On the relation of gene essentiality to intron structure: a computational and deep learning approach

Life Sci Alliance. 2021 Apr 27;4(6):e202000951. doi: 10.26508/lsa.202000951. Print 2021 Jun.

Abstract

Essential genes have been studied by copy number variants and deletions, both associated with introns. The premise of our work is that introns of essential genes have distinct characteristic properties. We provide support for this by training a deep learning model and demonstrating that introns alone can be used to classify essentiality. The model, limited to first introns, performs at an increased level, implicating first introns in essentiality. We identify unique properties of introns of essential genes, finding that their structure protects against deletion and intron-loss events, especially centered on the first intron. We show that GC density is increased in the first introns of essential genes, allowing for increased enhancer activity, protection against deletions, and improved splice site recognition. We find that first introns of essential genes are of remarkably smaller size than their nonessential counterparts, and to protect against common 3' end deletion events, essential genes carry an increased number of (smaller) introns. To demonstrate the importance of the seven features we identified, we train a feature-based model using only these features and achieve high performance.

MeSH terms

  • Alternative Splicing / genetics
  • Base Sequence / genetics
  • Computational Biology / methods
  • DNA Copy Number Variations / genetics
  • Databases, Genetic
  • Deep Learning
  • Exons / genetics
  • Genes, Essential / genetics*
  • Genes, Essential / physiology
  • Humans
  • INDEL Mutation / genetics
  • Introns / genetics*
  • Introns / physiology