Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data

J Comput Biol. 2022 Sep;29(9):943-960. doi: 10.1089/cmb.2021.0447. Epub 2022 May 30.

Abstract

Natural selection has been given a lot of attention because it relates to the adaptation of populations to their environments, both biotic and abiotic. An allele is selected when it is favored by natural selection. Consequently, the favored allele increases in frequency in the population and neighboring linked variation diminishes, causing so-called selective sweeps. A high-throughput genomic sequence allows one to disentangle the evolutionary forces at play in populations. With the development of high-throughput genome sequencing technologies, it has become easier to detect these selective sweeps/selection signatures. Various methods can be used to detect selective sweeps, from simple implementations using summary statistics to complex statistical approaches. One of the important problems of these statistical models is the potential to provide inaccurate results when their assumptions are violated. The use of machine learning (ML) in population genetics has been introduced as an alternative method of detecting selection by treating the problem of detecting selection signatures as a classification problem. Since the availability of population genomics data is increasing, researchers may incorporate ML into these statistical models to infer signatures of selection with higher predictive accuracy and better resolution. This article describes how ML can be used to aid in detecting and studying natural selection patterns using population genomic data.

Keywords: deep learning; genomics; machine learning; natural selection; selection signature; training model.

Publication types

  • Review

MeSH terms

  • Genetics, Population
  • Genomics / methods
  • Machine Learning
  • Metagenomics*
  • Selection, Genetic*