Genomic representation predicts an asymptotic host adaptation of bat coronaviruses using deep learning

Front Microbiol. 2023 May 5:14:1157608. doi: 10.3389/fmicb.2023.1157608. eCollection 2023.

Abstract

Introduction: Coronaviruses (CoVs) are naturally found in bats and can occasionally cause infection and transmission in humans and other mammals. Our study aimed to build a deep learning (DL) method to predict the adaptation of bat CoVs to other mammals.

Methods: The CoV genome was represented with a method of dinucleotide composition representation (DCR) for the two main viral genes, ORF1ab and Spike. DCR features were first analyzed for their distribution among adaptive hosts and then trained with a DL classifier of convolutional neural networks (CNN) to predict the adaptation of bat CoVs.

Results and discussion: The results demonstrated inter-host separation and intra-host clustering of DCR-represented CoVs for six host types: Artiodactyla, Carnivora, Chiroptera, Primates, Rodentia/Lagomorpha, and Suiformes. The DCR-based CNN with five host labels (without Chiroptera) predicted a dominant adaptation of bat CoVs to Artiodactyla hosts, then to Carnivora and Rodentia/Lagomorpha mammals, and later to primates. Moreover, a linear asymptotic adaptation of all CoVs (except Suiformes) from Artiodactyla to Carnivora and Rodentia/Lagomorpha and then to Primates indicates an asymptotic bats-other mammals-human adaptation.

Conclusion: Genomic dinucleotides represented as DCR indicate a host-specific separation, and clustering predicts a linear asymptotic adaptation shift of bat CoVs from other mammals to humans via deep learning.

Keywords: asymptotic adaptation; bat coronavirus; convolutional neural networks; deep learning; dinucleotide composition representation (DCR).

Grants and funding

This study was supported by grants from the National Key Research and Development Program of China (Grant Nos. 2021YFC2302004, 2021YFC0863400, 2019YFC1200501, and 2018YFA0903000) and the National Natural Science Foundation of China (Grant No. 32070166).