Statistical selection strategy for risk and protective rare variants associated with complex traits

J Comput Biol. 2015 Nov;22(11):1034-43. doi: 10.1089/cmb.2015.0091. Epub 2015 Oct 15.

Abstract

In genetic association studies with deep sequencing data, it is a challenging statistical problem to precisely locate rare variants associated with complex diseases or traits due to the limited number of observed genetic mutations. In particular, both risk and protective rare variants can be present in the same gene or genetic region. There currently exist very few statistical methods to separate casual rare variants from noncausal variants within a disease/trait-related gene or a genetic region, while there are relatively many statistical tests to detect a phenotypic association of a group of rare variants such as a gene or a genetic region. In this article, we propose a new statistical selection strategy that is able to locate causal rare variants within the disease/trait-related gene or a genetic region. The proposed procedure is to linearly combine potential risk and protective variants in order to find the optimal combination of rare variants that can have the strongest association signal. It is also computationally very efficient since the procedure is based on forward selection. In simulation studies we demonstrate that the selection performance of the proposed procedure is more powerful than other existing methods when both risk and protective variants are present. We also applied it to the real sequencing data on the ANGPTL gene family from the Dallas Heart Study.

Keywords: Dallas heart study; rare variant; risk and protective variants; sequencing data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Data Interpretation, Statistical*
  • Gene Frequency
  • Genetic Association Studies
  • Genetic Predisposition to Disease
  • Genome, Human
  • Humans
  • Multifactorial Inheritance
  • Protective Factors
  • Risk Factors