Exact association test for small size sequencing data

BMC Med Genomics. 2018 Apr 20;11(Suppl 2):30. doi: 10.1186/s12920-018-0344-z.

Abstract

Background: Recent statistical methods for next generation sequencing (NGS) data have been successfully applied to identifying rare genetic variants associated with certain diseases. However, most commonly used methods (e.g., burden tests and variance-component tests) rely on large sample sizes. Notwithstanding, due to its-still high cost, NGS data is generally restricted to small sample sizes, that cannot be analyzed by most existing methods.

Methods: In this work, we propose a new exact association test for sequencing data that does not require a large sample approximation, which is applicable to both common and rare variants. Our method, based on the Generalized Cochran-Mantel-Haenszel (GCMH) statistic, was applied to NGS datasets from intraductal papillary mucinous neoplasm (IPMN) patients. IPMN is a unique pancreatic cancer subtype that can turn into an invasive and hard-to-treat metastatic disease.

Results: Application of our method to IPMN data successfully identified susceptible genes associated with progression of IPMN to pancreatic cancer.

Conclusions: Our method is expected to identify disease-associated genetic variants more successfully, and corresponding signal pathways, improving our understanding of specific disease's etiology and prognosis.

Keywords: Association study; CMH statistic; Fisher’s exact test; IPMN; NGS data analysis; Small size sequencing data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Data Analysis*
  • Disease Progression
  • Female
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Male
  • Middle Aged
  • Neoplasm Metastasis
  • Pancreatic Neoplasms / genetics
  • Pancreatic Neoplasms / pathology
  • Sample Size