phylogatR: Phylogeographic data aggregation and repurposing

Mol Ecol Resour. 2022 Nov;22(8):2830-2842. doi: 10.1111/1755-0998.13673. Epub 2022 Jul 12.

Abstract

Patterns of genetic diversity within species contain information the history of that species, including how they have responded to historical climate change and how easily the organism is able to disperse across its habitat. More than 40,000 phylogeographic and population genetic investigations have been published to date, each collecting genetic data from hundreds of samples. Despite these millions of data points, meta-analyses are challenging because the synthesis of results across hundreds of studies, each using different methods and forms of analysis, is a daunting and time-consuming task. It is more efficient to proceed by repurposing existing data and using automated data analysis. To facilitate data repurposing, we created a database (phylogatR) that aggregates data from different sources and conducts automated multiple sequence alignments and data curation to provide users with nearly ready-to-analyse sets of data for thousands of species. Two types of scientific research will be made easier by phylogatR: large meta-analyses of thousands of species that can address classic questions in evolutionary biology and ecology, and student- or citizen- science based investigations that will introduce a broad range of people to the analysis of genetic data. phylogatR enhances the value of existing data via the creation of software and web-based tools that enable these data to be recycled and reanalysed and increase accessibility to big data for research laboratories and classroom instructors with limited computational expertise and resources.

Keywords: biodiversity informatics; data repurposing; genetic diversity; macrogenetics; open science.

MeSH terms

  • Data Aggregation*
  • Ecology* / methods
  • Ecosystem
  • Humans
  • Phylogeography
  • Software