Familial Hypercholesterolemia Identification by Machine Learning Using Lipid Profile Data Performs as Well as Clinical Diagnostic Criteria

Circ Genom Precis Med. 2022 Oct;15(5):e003324. doi: 10.1161/CIRCGEN.121.003324. Epub 2022 Sep 26.

Abstract

Background: Familial hypercholesterolemia (FH) is a common genetic disorder and, if not diagnosed and treated early, results in premature cardiovascular disease. Most individuals with FH are undiagnosed and machine learning offers a new prospect to improve FH identification. Our objective was to create a machine learning model from basic lipid profile data with better screening performance than LDL-C (low-density lipoprotein cholesterol) cutoff levels and diagnostic performance comparable to the Dutch Lipid Clinic Network criteria.

Methods: The model was developed combining logistic regression, deep learning, and random forest classification and trained on a 70% split of a data set of individuals clinically suspected of having FH. Model performance, as well as that of the LDL-C cutoff and Dutch Lipid Clinic Network criteria, were assessed on the internal 30% testing data set and an external data set by comparing the area under the receiver operator characteristic (AUROC) curves. All methodologies were measured against the gold standard of FH diagnosis by mutation identification. Furthermore, the model was also tested on 2 lower prevalence data sets.

Results: The machine learning model achieved an AUROC curve of 0.711 on the external data set (n=1376; FH prevalence=64%), which was superior to the LDL-C cutoff (AUROC=0.642) and comparable to the Dutch Lipid Clinic Network criteria (AUROC=0.705). The model performed even better when tested on the medium-prevalence (n=2655; FH prevalence=20%) and low-prevalence (n=1616; FH prevalence=1%) data sets, with AUROC curve values of 0.801 and 0.856, respectively.

Conclusions: Despite absence of clinical information, the model better identified genetically confirmed FH in a cohort of individuals suspected of having FH than LDL-C cutoff values and was comparable to the Dutch Lipid Clinic Network criteria. The model achieved higher accuracy when tested on 2 cohorts with lower FH prevalence. The application of machine learning is, therefore, a promising tool in both the screening for, and diagnosis of, individuals with FH.

Keywords: database; early diagnosis; familial hypercholesterolemia; machine learning; mass screening; mutation; precision medicine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Area Under Curve
  • Cholesterol, LDL
  • Humans
  • Hyperlipoproteinemia Type II* / diagnosis
  • Hyperlipoproteinemia Type II* / epidemiology
  • Hyperlipoproteinemia Type II* / genetics
  • Machine Learning
  • Mutation

Substances

  • Cholesterol, LDL