WGDdetector: a pipeline for detecting whole genome duplication events using the genome or transcriptome annotations

BMC Bioinformatics. 2019 Feb 13;20(1):75. doi: 10.1186/s12859-019-2670-3.

Abstract

Background: With the availability of well-assembled genomes of a growing number of organisms, identifying the bioinformatic basis of whole genome duplication (WGD) is a growing field of genomics. The most extant software for detecting footprints of WGDs has been restricted to a well-assembled genome. However, the massive poor quality genomes and the more accessible transcriptomes have been largely ignored, and in theoretically they are also likely to contribute to detect WGD using dS based method. Here, to resolve these problems, we have designed a universal and simple technical tool WGDdetector for detecting WGDs using either genome or transcriptome annotations in different organisms based on the widely used dS based method.

Results: We have constructed WGDdetector pipeline that integrates all analyses including gene family constructing, dS estimating and phasing, and outputting the dS values of each paralogs pairs processed with only one command. We further chose four species (Arabidopsis thaliana, Juglans regia, Populus trichocarpa and Xenopus laevis) representing herb, wood and animal, to test its practicability. Our final results showed a high degree of accuracy with the previous studies using both genome and transcriptome data.

Conclusion: WGDdetector is not only reliable and stable for genome data, but also a new way to using the transcriptome data to obtain the correct dS distribution for detecting WGD. The source code is freely available, and is implemented in Windows and Linux operation system.

Keywords: Genome; Transcriptome; Whole genome duplication; dS.

MeSH terms

  • Animals
  • Arabidopsis / genetics
  • Gene Duplication*
  • Genome*
  • Genomics / methods
  • Juglans / genetics
  • Molecular Sequence Annotation
  • Populus / genetics
  • Software*
  • Transcriptome*
  • Xenopus laevis