A Poisson binomial-based statistical testing framework for comorbidity discovery across electronic health record datasets

Nat Comput Sci. 2021 Oct;1(10):694-702. doi: 10.1038/s43588-021-00141-9. Epub 2021 Oct 21.

Abstract

Discovering the concomitant occurrence of distinct medical conditions in a patient, also known as comorbidities, is a prerequisite for creating patient outcome prediction tools. Current comorbidity discovery applications are designed for small datasets and use stratification to control for confounding variables such as age, sex or ancestry. Stratification lowers false positive rates, but reduces power, as the size of the study cohort is decreased. Here we describe a Poisson binomial-based approach to comorbidity discovery (PBC) designed for big-data applications that circumvents the need for stratification. PBC adjusts for confounding demographic variables on a per-patient basis and models temporal relationships. We benchmark PBC using two datasets to compute comorbidity statistics on 4,623,841 pairs of potentially comorbid medical terms. The results of this computation are provided as a searchable web resource. Compared with current methods, the PBC approach reduces false positive associations while retaining statistical power to discover true comorbidities.