Using Machine Learning to Explore Shared Genetic Pathways and Possible Endophenotypes in Autism Spectrum Disorder

Genes (Basel). 2023 Jan 25;14(2):313. doi: 10.3390/genes14020313.

Abstract

Autism spectrum disorder (ASD) is a heterogeneous condition, characterized by complex genetic architectures and intertwined genetic/environmental interactions. Novel analysis approaches to disentangle its pathophysiology by computing large amounts of data are needed. We present an advanced machine learning technique, based on a clustering analysis on genotypical/phenotypical embedding spaces, to identify biological processes that might act as pathophysiological substrates for ASD. This technique was applied to the VariCarta database, which contained 187,794 variant events retrieved from 15,189 individuals with ASD. Nine clusters of ASD-related genes were identified. The 3 largest clusters included 68.6% of all individuals, consisting of 1455 (38.0%), 841 (21.9%), and 336 (8.7%) persons, respectively. Enrichment analysis was applied to isolate clinically relevant ASD-associated biological processes. Two of the identified clusters were characterized by individuals with an increased presence of variants linked to biological processes and cellular components, such as axon growth and guidance, synaptic membrane components, or transmission. The study also suggested other clusters with possible genotype-phenotype associations. Innovative methodologies, including machine learning, can improve our understanding of the underlying biological processes and gene variant networks that undergo the etiology and pathogenic mechanisms of ASD. Future work to ascertain the reproducibility of the presented methodology is warranted.

Keywords: Autism spectrum disorder (ASD); cluster analysis; connectivity; gene networks; genotype–phenotype embedding; machine learning; neurite morphogenesis; neurobehavioral phenotypes; neurotransmission; patient similarity analytics; synapses.

MeSH terms

  • Autism Spectrum Disorder* / genetics
  • Endophenotypes
  • Genetic Association Studies
  • Humans
  • Machine Learning
  • Reproducibility of Results

Grants and funding

This research received no external funding.