Inferring Historical Introgression with Deep Learning

Syst Biol. 2023 Nov 1;72(5):1013-1038. doi: 10.1093/sysbio/syad033.

Abstract

Resolving phylogenetic relationships among taxa remains a challenge in the era of big data due to the presence of genetic admixture in a wide range of organisms. Rapidly developing sequencing technologies and statistical tests enable evolutionary relationships to be disentangled at a genome-wide level, yet many of these tests are computationally intensive and rely on phased genotypes, large sample sizes, restricted phylogenetic topologies, or hypothesis testing. To overcome these difficulties, we developed a deep learning-based approach, named ERICA, for inferring genome-wide evolutionary relationships and local introgressed regions from sequence data. ERICA accepts sequence alignments of both population genomic data and multiple genome assemblies, and efficiently identifies discordant genealogy patterns and exchanged regions across genomes when compared with other methods. We further tested ERICA using real population genomic data from Heliconius butterflies that have undergone adaptive radiation and frequent hybridization. Finally, we applied ERICA to characterize hybridization and introgression in wild and cultivated rice, revealing the important role of introgression in rice domestication and adaptation. Taken together, our findings demonstrate that ERICA provides an effective method for teasing apart evolutionary relationships using whole genome data, which can ultimately facilitate evolutionary studies on hybridization and introgression.

Keywords: Convolutional neural network; deep learning; evolutionary relationship; hybridization; introgression.

MeSH terms

  • Animals
  • Biological Evolution
  • Butterflies* / genetics
  • Deep Learning*
  • Gene Flow
  • Hybridization, Genetic
  • Phylogeny