Background and objective: Electronic health records databases are increasingly used for identifying cohort populations, covariates, or outcomes, but discerning such clinical 'phenotypes' accurately is an ongoing challenge. We developed a flexible method using overlapping (Venn diagram) queries. Here we describe this approach to find patients hospitalized with acute congestive heart failure (CHF), a sampling strategy for one-by-one 'gold standard' chart review, and calculation of positive predictive value (PPV) and sensitivities, with SEs, across different definitions.
Materials and methods: We used retrospective queries of hospitalizations (2002-2011) in the Indiana Network for Patient Care with any CHF ICD-9 diagnoses, a primary diagnosis, an echocardiogram performed, a B-natriuretic peptide (BNP) drawn, or BNP >500 pg/mL. We used a hybrid between proportional sampling by Venn zone and over-sampling non-overlapping zones. The acute CHF (presence/absence) outcome was based on expert chart review using a priori criteria.
Results: Among 79,091 hospitalizations, we reviewed 908. A query for any ICD-9 code for CHF had PPV 42.8% (SE 1.5%) for acute CHF and sensitivity 94.3% (1.3%). Primary diagnosis of 428 and BNP >500 pg/mL had PPV 90.4% (SE 2.4%) and sensitivity 28.8% (1.1%). PPV was <10% when there was no echocardiogram, no BNP, and no primary diagnosis. 'False positive' hospitalizations were for other heart disease, lung disease, or other reasons.
Conclusions: This novel method successfully allowed flexible application and validation of queries for patients hospitalized with acute CHF.
Keywords: Algorithms; Electronic Health Records; Heart Failure; Phenotypes; Predictive Value of Tests; Validation Studies.