RAVAQ: An integrative pipeline from quality control to region-based rare variant association analysis

Genet Epidemiol. 2022 Jul;46(5-6):256-265. doi: 10.1002/gepi.22450. Epub 2022 Apr 14.

Abstract

Next-generation sequencing technologies have opened up the possibility to sequence large samples of cases and controls to test for association with rare variants. To limit cost and increase sample sizes, data from controls could be used in multiple studies and might thus be generated on different sequencing platforms. This could pose some problems of comparability between cases and controls due to batch effects that could be confounding factors, leading to false-positive association signals. To limit batch effects and ensure comparability of datasets, stringent quality controls are required. We propose an integrative five-steps pipeline, RAVAQ, that (a) performs a specific three-step quality control taking into account the case-control status to ensure data comparability, (b) selects qualifying variants as defined by the user, and (c) performs rare variant association tests per genomic region. The RAVAQ pipeline is wrapped in an R package. It is user-friendly and flexible in its arguments to adapt to the specificity of each research project. We provide examples showing how RAVAQ improves rare variant association tests. The default RAVAQ quality control outperformed the widely used Variant Quality Score Recalibration method, removing inflation due to spurious signals. RAVAQ is open source and freely available at https://gitlab.com/gmarenne/ravaq.

Keywords: free open-source package; method development; quality control; rare variant association testing; sequencing data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Genome
  • Genomics*
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Quality Control
  • Software