Familial Hypercholesterolemia Identification by Machine Learning Using Lipid Profile Data Performs as Well as Clinical Diagnostic Criteria

Reinhardt Hesse; Frederick J Raal; Dirk J Blom; Jaya A George

doi:10.1161/CIRCGEN.121.003324

Familial Hypercholesterolemia Identification by Machine Learning Using Lipid Profile Data Performs as Well as Clinical Diagnostic Criteria

Circ Genom Precis Med. 2022 Oct;15(5):e003324. doi: 10.1161/CIRCGEN.121.003324. Epub 2022 Sep 26.

Authors

Reinhardt Hesse¹, Frederick J Raal², Dirk J Blom³, Jaya A George¹

Affiliations

¹ Department of Chemical Pathology, University of the Witwatersrand, National Health Laboratory Service, Johannesburg, South Africa (R.H., J.A.G.).
² Division of Endocrinology and Metabolism, Department of Internal Medicine, University of the Witwatersrand, Johannesburg, South Africa (F.J.R.).
³ Division of Lipidology, Hatter Institute for Cardiovascular Research in Southern Africa, Department of Medicine, University of Cape Town, Cape Town, South Africa (D.J.B.).

PMID: 36154661
DOI: 10.1161/CIRCGEN.121.003324

Abstract

Background: Familial hypercholesterolemia (FH) is a common genetic disorder and, if not diagnosed and treated early, results in premature cardiovascular disease. Most individuals with FH are undiagnosed and machine learning offers a new prospect to improve FH identification. Our objective was to create a machine learning model from basic lipid profile data with better screening performance than LDL-C (low-density lipoprotein cholesterol) cutoff levels and diagnostic performance comparable to the Dutch Lipid Clinic Network criteria.

Methods: The model was developed combining logistic regression, deep learning, and random forest classification and trained on a 70% split of a data set of individuals clinically suspected of having FH. Model performance, as well as that of the LDL-C cutoff and Dutch Lipid Clinic Network criteria, were assessed on the internal 30% testing data set and an external data set by comparing the area under the receiver operator characteristic (AUROC) curves. All methodologies were measured against the gold standard of FH diagnosis by mutation identification. Furthermore, the model was also tested on 2 lower prevalence data sets.

Results: The machine learning model achieved an AUROC curve of 0.711 on the external data set (n=1376; FH prevalence=64%), which was superior to the LDL-C cutoff (AUROC=0.642) and comparable to the Dutch Lipid Clinic Network criteria (AUROC=0.705). The model performed even better when tested on the medium-prevalence (n=2655; FH prevalence=20%) and low-prevalence (n=1616; FH prevalence=1%) data sets, with AUROC curve values of 0.801 and 0.856, respectively.

Conclusions: Despite absence of clinical information, the model better identified genetically confirmed FH in a cohort of individuals suspected of having FH than LDL-C cutoff values and was comparable to the Dutch Lipid Clinic Network criteria. The model achieved higher accuracy when tested on 2 cohorts with lower FH prevalence. The application of machine learning is, therefore, a promising tool in both the screening for, and diagnosis of, individuals with FH.

Keywords: database; early diagnosis; familial hypercholesterolemia; machine learning; mass screening; mutation; precision medicine.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Area Under Curve
Cholesterol, LDL
Humans
Hyperlipoproteinemia Type II* / diagnosis
Hyperlipoproteinemia Type II* / epidemiology
Hyperlipoproteinemia Type II* / genetics
Machine Learning
Mutation

Substances

Cholesterol, LDL