Machine learning models reveal distinct disease subgroups and improve diagnostic and prognostic accuracy for individuals with pathogenic SCN8A gain-of-function variants

Biol Open. 2024 Apr 15;13(4):bio060286. doi: 10.1242/bio.060286. Epub 2024 Apr 24.

Abstract

Distinguishing clinical subgroups for patients suffering with diseases characterized by a wide phenotypic spectrum is essential for developing precision therapies. Patients with gain-of-function (GOF) variants in the SCN8A gene exhibit substantial clinical heterogeneity, viewed historically as a linear spectrum ranging from mild to severe. To test for hidden clinical subgroups, we applied two machine-learning algorithms to analyze a dataset of patient features collected by the International SCN8A Patient Registry. We used two research methodologies: a supervised approach that incorporated feature severity cutoffs based on clinical conventions, and an unsupervised approach employing an entirely data-driven strategy. Both approaches found statistical support for three distinct subgroups and were validated by correlation analyses using external variables. However, distinguishing features of the three subgroups within each approach were not concordant, suggesting a more complex phenotypic landscape. The unsupervised approach yielded strong support for a model involving three partially ordered subgroups rather than a linear spectrum. Application of these machine-learning approaches may lead to improved prognosis and clinical management of individuals with SCN8A GOF variants and provide insights into the underlying mechanisms of the disease.

Keywords: Clinical phenotype; Genetic epilepsy; Patient registry; Pediatric epilepsy; Rare disease; Unsupervised learning.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Algorithms
  • Female
  • Gain of Function Mutation
  • Genetic Predisposition to Disease
  • Humans
  • Machine Learning*
  • Male
  • NAV1.6 Voltage-Gated Sodium Channel* / genetics
  • Phenotype
  • Prognosis

Substances

  • NAV1.6 Voltage-Gated Sodium Channel
  • SCN8A protein, human