Splice site recognition - deciphering Exon-Intron transitions for genetic insights using Enhanced integrated Block-Level gated LSTM model

Gene. 2024 Jul 15:915:148429. doi: 10.1016/j.gene.2024.148429. Epub 2024 Apr 3.

Abstract

Bioinformatics is a contemporary interdisciplinary area focused on analyzing the growing number of genome sequences. Gene variants are differences in DNA sequences among individuals within a population. Splice site recognition is a crucial step in the process of gene expression, where the coding sequences of genes are joined together to form mature messenger RNA (mRNA). These genetic variants that disrupt genes are believed to be the primary reason for neuro-developmental disorders like ASD (Autism Spectrum Disorder) is a neuro-developmental disorder that is diagnosed in individuals, families, and society and occurs as the developmental delay in one among the hundred genes that are associated with these disorders. Missense variants, premature stop codons, or deletions alter both the quality and quantity of encoded proteins. Predicting genes within exons and introns presents main challenges, such as dealing with sequencing errors, short reads, incomplete genes, overlapping, and more. Although many traditional techniques have been utilized in creating an exon prediction system, the primary challenge lies in accurately identifying the length and spliced strand location classification of exons in conjunction with introns. From now on, the suggested approach utilizes a Deep Learning algorithm to analyze intricate and extensive genomic datasets. M-LSTM is utilized to categorize three binary combinations (EI as 1, IE as 2, and none as 3) using spliced DNA strands. The M-LSTM system is able to sequence extensive datasets, ensuring that long information can be stored without any impact on the current input or output. This enables it to recognize and address long-term connections and problems with rapidly increasing gradients. The proposed model is compared internally with Naïve Bayes and Random Forest to assess its efficacy. Additionally, the proposed model's performance is forecasted by utilizing probabilistic parameters like recall, F1-score, precision, and accuracy to assess the effectiveness of the proposed system.

Keywords: Block gate; Deep learning; Divergent gate; Exon; Genome; Intron; LSTM; Merge gate; Mutations; Splicing.

MeSH terms

  • Algorithms
  • Autism Spectrum Disorder / genetics
  • Computational Biology / methods
  • Deep Learning
  • Exons* / genetics
  • Humans
  • Introns* / genetics
  • RNA Splice Sites*
  • RNA Splicing

Substances

  • RNA Splice Sites