NATURAL LANGUAGE PROCESSING BASED MACHINE LEARNING MODEL USING CARDIAC MRI REPORTS TO IDENTIFY HYPERTROPHIC CARDIOMYOPATHY PATIENTS

Divaakar Siva Baala Sundaram; Shivaram P Arunachalam; Devanshi N Damani; Nasibeh Zanjirani Farahani; Moein Enayati; Kalyan S Pasupathy; Adelaide M Arruda-Olson

doi:10.1115/dmd2021-1076

NATURAL LANGUAGE PROCESSING BASED MACHINE LEARNING MODEL USING CARDIAC MRI REPORTS TO IDENTIFY HYPERTROPHIC CARDIOMYOPATHY PATIENTS

Proc Des Med Devices Conf. 2021 Apr:2021:V001T03A005. doi: 10.1115/dmd2021-1076. Epub 2021 May 11.

Authors

Divaakar Siva Baala Sundaram¹, Shivaram P Arunachalam¹, Devanshi N Damani¹, Nasibeh Zanjirani Farahani¹, Moein Enayati¹, Kalyan S Pasupathy¹, Adelaide M Arruda-Olson¹

Affiliation

¹ Mayo Clinic Rochester, MN.

Abstract

Hypertrophic Cardiomyopathy (HCM) is the most common genetic heart disease in the US and is known to cause sudden death (SCD) in young adults. While significant advancements have been made in HCM diagnosis and management, there is a need to identify HCM cases from electronic health record (EHR) data to develop automated tools based on natural language processing guided machine learning (ML) models for accurate HCM case identification to improve management and reduce adverse outcomes of HCM patients. Cardiac Magnetic Resonance (CMR) Imaging, plays a significant role in HCM diagnosis and risk stratification. CMR reports, generated by clinician annotation, offer rich data in the form of cardiac measurements as well as narratives describing interpretation and phenotypic description. The purpose of this study is to develop an NLP-based interpretable model utilizing impressions extracted from CMR reports to automatically identify HCM patients. CMR reports of patients with suspected HCM diagnosis between the years 1995 to 2019 were used in this study. Patients were classified into three categories of yes HCM, no HCM and, possible HCM. A random forest (RF) model was developed to predict the performance of both CMR measurements and impression features to identify HCM patients. The RF model yielded an accuracy of 86% (608 features) and 85% (30 features). These results offer promise for accurate identification of HCM patients using CMR reports from EHR for efficient clinical management transforming health care delivery for these patients.

Keywords: cardiac MRI; electronic health records (EHR); hypertrophic cardiomyopathy (HCM); machine learning; natural language processing (NLP).

Grants and funding

K01 HL124045/HL/NHLBI NIH HHS/United States