The Impact of Diagnostic Code Misclassification on Optimizing the Experimental Design of Genetic Association Studies

J Healthc Eng. 2017:2017:7653071. doi: 10.1155/2017/7653071. Epub 2017 Oct 18.

Abstract

Diagnostic codes within electronic health record systems can vary widely in accuracy. It has been noted that the number of instances of a particular diagnostic code monotonically increases with the accuracy of disease phenotype classification. As a growing number of health system databases become linked with genomic data, it is critically important to understand the effect of this misclassification on the power of genetic association studies. Here, I investigate the impact of this diagnostic code misclassification on the power of genetic association studies with the aim to better inform experimental designs using health informatics data. The trade-off between (i) reduced misclassification rates from utilizing additional instances of a diagnostic code per individual and (ii) the resulting smaller sample size is explored, and general rules are presented to improve experimental designs.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Electronic Health Records
  • Genetic Association Studies*
  • Genomics
  • Humans
  • International Classification of Diseases*
  • Machine Learning
  • Medical Informatics
  • Models, Statistical
  • Normal Distribution
  • Phenotype
  • Predictive Value of Tests
  • Reproducibility of Results
  • Research Design*
  • Sample Size