New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning

BMC Bioinformatics. 2023 Mar 11;24(1):92. doi: 10.1186/s12859-023-05188-1.

Abstract

Background: In December 2019, the first case of COVID-19 was described in Wuhan, China, and by July 2022, there were already 540 million confirmed cases. Due to the rapid spread of the virus, the scientific community has made efforts to develop techniques for the viral classification of SARS-CoV-2.

Results: In this context, we developed a new proposal for gene sequence representation with Genomic Signal Processing techniques for the work presented in this paper. First, we applied the mapping approach to samples of six viral species of the Coronaviridae family, which belongs SARS-CoV-2 Virus. We then used the sequence downsized obtained by the method proposed in a deep learning architecture for viral classification, achieving an accuracy of 98.35%, 99.08%, and 99.69% for the 64, 128, and 256 sizes of the viral signatures, respectively, and obtaining 99.95% precision for the vectors with size 256.

Conclusions: The classification results obtained, in comparison to the results produced using other state-of-the-art representation techniques, demonstrate that the proposed mapping can provide a satisfactory performance result with low computational memory and processing time costs.

Keywords: CGR DFT; COVID-19; Deep learning; GSP; SARS-CoV-2.

MeSH terms

  • COVID-19* / genetics
  • Deep Learning*
  • Genome, Viral
  • Genomics
  • Humans
  • SARS-CoV-2 / genetics