Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches

Mei-Sing Ong; Jeffrey G Klann; Kueiyu Joshua Lin; Bradley A Maron; Shawn N Murphy; Marc D Natter; Kenneth D Mandl

doi:10.1161/JAHA.120.016648

Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches

J Am Heart Assoc. 2020 Oct 20;9(19):e016648. doi: 10.1161/JAHA.120.016648. Epub 2020 Sep 29.

Authors

Mei-Sing Ong^{1

2}, Jeffrey G Klann³, Kueiyu Joshua Lin⁴, Bradley A Maron⁵, Shawn N Murphy⁶, Marc D Natter^{2

7}, Kenneth D Mandl^{2

7

8}

Affiliations

¹ Department of Population Medicine Harvard Medical School & Harvard Pilgrim Health Care Institute Boston MA.
² Computational Health Informatics Program Boston Children's Hospital Boston MA.
³ Laboratory of Computer Science Massachusetts General Hospital Harvard Medical School Boston MA.
⁴ Division of Pharmacoepidemiology and Pharmacoeconomics Department of Medicine Brigham and Women's Hospital Harvard Medical School Boston MA.
⁵ Cardiovascular Division Department of Medicine Brigham and Women's Hospital Harvard Medical School Boston MA.
⁶ Department of Neurology Massachusetts General Hospital, Harvard Medical School Boston MA.
⁷ Department of Pediatrics Harvard Medical School Boston MA.
⁸ Department of Biomedical Informatics Harvard Medical School Boston MA.

Abstract

Background Real-world healthcare data are an important resource for epidemiologic research. However, accurate identification of patient cohorts-a crucial first step underpinning the validity of research results-remains a challenge. We developed and evaluated claims-based case ascertainment algorithms for pulmonary hypertension (PH), comparing conventional decision rules with state-of-the-art machine-learning approaches. Methods and Results We analyzed an electronic health record-Medicare linked database from two large academic tertiary care hospitals (years 2007-2013). Electronic health record charts were reviewed to form a gold standard cohort of patients with (n=386) and without PH (n=164). Using health encounter data captured in Medicare claims (including patients' demographics, diagnoses, medications, and procedures), we developed and compared 2 approaches for identifying patients with PH: decision rules and machine-learning algorithms using penalized lasso regression, random forest, and gradient boosting machine. The most optimal rule-based algorithm-having ≥3 PH-related healthcare encounters and having undergone right heart catheterization-attained an area under the receiver operating characteristic curve of 0.64 (sensitivity, 0.75; specificity, 0.48). All 3 machine-learning algorithms outperformed the most optimal rule-based algorithm (P<0.001). A model derived from the random forest algorithm achieved an area under the receiver operating characteristic curve of 0.88 (sensitivity, 0.87; specificity, 0.70), and gradient boosting machine achieved comparable results (area under the receiver operating characteristic curve, 0.85; sensitivity, 0.87; specificity, 0.70). Penalized lasso regression achieved an area under the receiver operating characteristic curve of 0.73 (sensitivity, 0.70; specificity, 0.68). Conclusions Research-grade case identification algorithms for PH can be derived and rigorously validated using machine-learning algorithms. Simple decision rules commonly applied in published literature performed poorly; more complex rule-based algorithms may potentially address the limitation of this approach. PH research using claims data would be considerably strengthened through the use of validated algorithms for cohort ascertainment.

Keywords: computable phenotype; machine learning; pulmonary hypertension.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Aged
Algorithms*
Decision Support Techniques
Female
Humans
Hypertension, Pulmonary / epidemiology*
Insurance Claim Review*
Machine Learning*
Male

Abstract

Publication types

MeSH terms

Grants and funding