Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks

PLoS One. 2023 Sep 21;18(9):e0291925. doi: 10.1371/journal.pone.0291925. eCollection 2023.

Abstract

Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • DNA Transposable Elements* / genetics
  • Eukaryota
  • Genomics*
  • Neural Networks, Computer

Substances

  • DNA Transposable Elements

Grants and funding

The authors were supported by Universidad Autónoma de Manizales, Manizales, Colombia under project 752-115 and by Ministry of Science, Technology and Innovation (Minciencias) of Colombia under grant 907-2021. This work was supported by Minciencias-Ecos Nord grant Nro C21MA01 and 285-2021 and STICAMSUD 21-STIC-13 (Call STIC AmSud Latin America with France, Colombia, Chile, and Brazil from TELearning Project 2021-22). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.