The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation

S van Doorn; T B Brakenhoff; K G M Moons; F H Rutten; A W Hoes; R H H Groenwold; G J Geersing

doi:10.1186/s41512-017-0018-x

The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation

Diagn Progn Res. 2017 Nov 16:1:18. doi: 10.1186/s41512-017-0018-x. eCollection 2017.

Authors

S van Doorn¹, T B Brakenhoff¹, K G M Moons¹, F H Rutten¹, A W Hoes¹, R H H Groenwold¹, G J Geersing¹

Affiliation

¹ Julius Center for Health Sciences and Primary care, University Medical Center Utrecht, PO box 85500, 3508 AB Utrecht, The Netherlands.

Abstract

Background: Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation of suboptimal prediction models. The extent to which misclassification affects the validation of existing prediction models is currently unclear.We aimed to quantify the amount of misclassification in routine care data and its effect on the validation of the existing risk prediction model. As an illustrative example, we validated the CHA2DS2-VASc prediction rule for predicting mortality in patients with atrial fibrillation (AF).

Methods: In a prospective cohort in general practice in the Netherlands, we used computerized retrieved data from the electronic medical records of patients known with AF as index predictors. Additionally, manually collected data after scrutinizing all complete medical files were used as reference predictors. Comparing the index with the reference predictors, we assessed misclassification in individual predictors by calculating Cohen's kappas and other diagnostic test accuracy measures. Predictive performance was quantified by the c-statistic and by determining calibration of multivariable models.

Results: In total, 2363 AF patients were included. After a median follow-up of 2.7 (IQR 2.3-3.0) years, 368 patients died (incidence rate 6.2 deaths per 100 person-years). Misclassification in individual predictors ranged from substantial (Cohen's kappa 0.56 for prior history of heart failure) to minor (kappa 0.90 for a history of type 2 diabetes). The overall model performance was not affected when using either index or reference predictors, with a c-statistic of 0.684 and 0.681, respectively, and similar calibration.

Conclusion: In a case study validating the CHA2DS2-VASc prediction model, we found substantial predictor misclassification in routine healthcare data with only limited effect on overall model performance. Our study should be repeated for other often applied prediction models to further evaluate the usefulness of routinely available healthcare data for validating prognostic models in the presence of predictor misclassification.

Keywords: Atrial fibrillation; CHA2DS2-VASc; Misclassification; Prediction model; Routine care data; Validation.