A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests

Bioinformatics. 2017 Aug 1;33(15):2330-2336. doi: 10.1093/bioinformatics/btx130.

Abstract

Motivation: Increasing amounts of whole exome or genome sequencing data present the challenge of analysing rare variants with extremely small minor allele frequencies. Various statistical tests have been proposed, which are specifically configured to increase power for rare variants by conducting the test within a certain bin, such as a gene or a pathway. However, a gene may contain from several to thousands of markers, and not all of them are related to the phenotype. Combining functional and non-functional variants in an arbitrary genomic region could impair the testing power.

Results: We propose a Zoom-Focus algorithm (ZFA) to locate the optimal testing region within a given genomic region. It can be applied as a wrapper function in existing rare variant association tests to increase testing power. The algorithm consists of two steps. In the first step, Zooming, a given genomic region is partitioned by an order of two, and the best partition is located. In the second step, Focusing, the boundaries of the zoomed region are refined. Simulation studies showed that ZFA substantially increased the statistical power of rare variants' tests, including the SKAT, SKAT-O, burden test and the W-test. The algorithm was applied on real exome sequencing data of hypertensive disorder, and identified biologically relevant genetic markers to metabolic disorders that were undetectable by a gene-based method. The proposed algorithm is an efficient and powerful tool to enhance the power of association study for whole exome or genome sequencing data.

Availability and implementation: The ZFA software is available at: http://www2.ccrb.cuhk.edu.hk/statgene/software.html.

Contact: maggiew@cuhk.edu.hk or bzee@cuhk.edu.hk.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Computer Simulation
  • Exome
  • Gene Frequency
  • Genetic Association Studies / methods*
  • Genetic Markers
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • Genomics / methods
  • Humans
  • Hypertension / genetics
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • Genetic Markers