Development and application of pharmacological statin-associated muscle symptoms phenotyping algorithms using structured and unstructured electronic health records data

JAMIA Open. 2023 Oct 24;6(4):ooad087. doi: 10.1093/jamiaopen/ooad087. eCollection 2023 Dec.

Abstract

Importance: Statins are widely prescribed cholesterol-lowering medications in the United States, but their clinical benefits can be diminished by statin-associated muscle symptoms (SAMS), leading to discontinuation.

Objectives: In this study, we aimed to develop and validate a pharmacological SAMS clinical phenotyping algorithm using electronic health records (EHRs) data from Minnesota Fairview.

Materials and methods: We retrieved structured and unstructured EHR data of statin users and manually ascertained a gold standard set of SAMS cases and controls using the published SAMS-Clinical Index tool from clinical notes in 200 patients. We developed machine learning algorithms and rule-based algorithms that incorporated various criteria, including ICD codes, statin allergy, creatine kinase elevation, and keyword mentions in clinical notes. We applied the best-performing algorithm to the statin cohort to identify SAMS.

Results: We identified 16 889 patients who started statins in the Fairview EHR system from 2010 to 2020. The combined rule-based (CRB) algorithm, which utilized both clinical notes and structured data criteria, achieved similar performance compared to machine learning algorithms with a precision of 0.85, recall of 0.71, and F1 score of 0.77 against the gold standard set. Applying the CRB algorithm to the statin cohort, we identified the pharmacological SAMS prevalence to be 1.9% and selective risk factors which included female gender, coronary artery disease, hypothyroidism, and use of immunosuppressants or fibrates.

Discussion and conclusion: Our study developed and validated a simple pharmacological SAMS phenotyping algorithm that can be used to create SAMS case/control cohort to enable further analysis which can lead to the development of a SAMS risk prediction model.

Keywords: electronic health records; hydroxymethylglutaryl-CoA reductase inhibitors; machine learning; phenotyping; precision medicine.