genepopedit: a simple and flexible tool for manipulating multilocus molecular data in R

Mol Ecol Resour. 2017 Jan;17(1):12-18. doi: 10.1111/1755-0998.12569. Epub 2016 Aug 11.

Abstract

Advances in genetic sequencing technologies and techniques have made large, genome-wide data sets comprised of hundreds or even thousands of individuals and loci the norm rather than the exception even for nonmodel organisms. While such data present new opportunities for evaluating population structure and demographic processes, the large size of these genomic data sets brings new computational challenges for researchers needing to parse, convert and manipulate data often into a variety of software-specific formats required of genomic analyses. We developed genepopedit as a flexible tool for the manipulation of multilocus molecular data sets. Functionality can be divided among diagnostic-, manipulation-, sampling-, simulation-, and transformation-based tools. Metadata from large genomic data sets can be efficiently extracted, without the need to view data in a text-editing program. genepopedit provides tools to manipulate loci, individual samples and populations included in genomic data sets, in addition to the ability to convert directly to a variety of software formats. Functions are compiled as an R package, which can integrate into existing analysis workflows. Importantly, genepopedit provides a simple yet robust code-based tool for repeatable genomic data manipulation, which has been proven to be stable for data sets in excess of 200 000 SNPs. The latest version of the package and associated documentation are available on Github (github.com/rystanley/genepopedit).

Keywords: R statistics; conservation genetics; data conversion; data manipulation; loci selection; population genetics.

MeSH terms

  • Computational Biology / methods*
  • Electronic Data Processing / methods*
  • Genetic Loci*
  • Genomics / methods*
  • Software*