Boosting signals in gene-based association studies via efficient SNP selection

Brief Bioinform. 2014 Mar;15(2):279-91. doi: 10.1093/bib/bbs087. Epub 2013 Jan 15.

Abstract

Set-based association studies based on genes or pathways have shown great promise in interpreting association signals associated with complex diseases. These approaches are particularly useful when variants in a set have moderate effects and are difficult to be detected with single marker analysis, especially when variants function jointly in a complicated manner. The set-based analyses use a summary statistic such as the maximum or average of individual signal (e.g. a chi-square statistic) over all variants in a set, or consider their joint distribution to assess the significance of the set. The signal obtained with this treatment, however, could be potentially diluted when noisy variants are not taken good care of, leading to either inflated false negatives or false positives. Thus, the selection of disease informative single-nucleotide polymorphism (diSNPs) plays a crucial role in improving the power of the set-based association study. In this work, we propose an efficient diSNP selection method based on the information theory. We select diSNP variants by considering their relative information contribution to a disease status, which is different from the usual tag SNP selection. The relative merit of pre-selecting diSNPs in a set-based association analysis is demonstrated through extensive simulation studies and real data analysis.

Keywords: entropy; gene-centric association; mutual information; set-based association.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Collagen Type I / genetics
  • Computational Biology / methods*
  • Computer Simulation
  • Disease / genetics
  • Genome-Wide Association Study / statistics & numerical data*
  • Humans
  • Infant, Newborn
  • Infant, Small for Gestational Age
  • Information Theory
  • Linkage Disequilibrium
  • Models, Genetic
  • Models, Statistical
  • Polymorphism, Single Nucleotide*

Substances

  • Collagen Type I