New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era

G3 (Bethesda). 2016 Jun 1;6(6):1563-71. doi: 10.1534/g3.116.028233.

Abstract

Genetic recombination is a very important evolutionary mechanism that mixes parental haplotypes and produces new raw material for organismal evolution. As a result, information on recombination rates is critical for biological research. In this paper, we introduce a new extremely fast open-source software package (FastEPRR) that uses machine learning to estimate recombination rate [Formula: see text] (=[Formula: see text]) from intraspecific DNA polymorphism data. When [Formula: see text] and the number of sampled diploid individuals is large enough ([Formula: see text]), the variance of [Formula: see text] remains slightly smaller than that of [Formula: see text] The new estimate [Formula: see text] (calculated by averaging [Formula: see text] and [Formula: see text]) has the smallest variance of all cases. When estimating [Formula: see text], the finite-site model was employed to analyze cases with a high rate of recurrent mutations, and an additional method is proposed to consider the effect of variable recombination rates within windows. Simulations encompassing a wide range of parameters demonstrate that different evolutionary factors, such as demography and selection, may not increase the false positive rate of recombination hotspots. Overall, accuracy of FastEPRR is similar to the well-known method, LDhat, but requires far less computation time. Genetic maps for each human population (YRI, CEU, and CHB) extracted from the 1000 Genomes OMNI data set were obtained in less than 3 d using just a single CPU core. The Pearson Pairwise correlation coefficient between the [Formula: see text] and [Formula: see text] maps is very high, ranging between 0.929 and 0.987 at a 5-Mb scale. Considering that sample sizes for these kinds of data are increasing dramatically with advances in next-generation sequencing technologies, FastEPRR (freely available at http://www.picb.ac.cn/evolgen/) is expected to become a widely used tool for establishing genetic maps and studying recombination hotspots in the population genomic era.

Keywords: boosting; fast estimation; genetic map; genomic era; population recombination rates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genetics, Population* / methods
  • Genome
  • Genomics* / methods
  • Haplotypes
  • Humans
  • Polymorphism, Genetic
  • Recombination, Genetic*
  • Reproducibility of Results
  • Software*