A multi-scenario genome-wide medical population genetics simulation framework

Bioinformatics. 2017 Oct 1;33(19):2995-3002. doi: 10.1093/bioinformatics/btx369.

Abstract

Motivation: Recent technological advances in high-throughput sequencing and genotyping have facilitated an improved understanding of genomic structure and disease-associated genetic factors. In this context, simulation models can play a critical role in revealing various evolutionary and demographic effects on genomic variation, enabling researchers to assess existing and design novel analytical approaches. Although various simulation frameworks have been suggested, they do not account for natural selection in admixture processes. Most are tailored to a single chromosome or a genomic region, very few capture large-scale genomic data, and most are not accessible for genomic communities.

Results: Here we develop a multi-scenario genome-wide medical population genetics simulation framework called 'FractalSIM'. FractalSIM has the capability to accurately mimic and generate genome-wide data under various genetic models on genetic diversity, genomic variation affecting diseases and DNA sequence patterns of admixed and/or homogeneous populations. Moreover, the framework accounts for natural selection in both homogeneous and admixture processes. The outputs of FractalSIM have been assessed using popular tools, and the results demonstrated its capability to accurately mimic real scenarios. They can be used to evaluate the performance of a range of genomic tools from ancestry inference to genome-wide association studies.

Availability and implementation: The FractalSIM package is available at http://www.cbio.uct.ac.za/FractalSIM.

Contact: emile.chimusa@uct.ac.za.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Genetic Variation
  • Genetics, Population / methods*
  • Genome
  • Genome-Wide Association Study
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Polymorphism, Single Nucleotide
  • Selection, Genetic
  • Sequence Analysis, DNA
  • Software