Clustering Analysis Supports the Detection of Biological Processes Related to Autism Spectrum Disorder

Genes (Basel). 2020 Dec 9;11(12):1476. doi: 10.3390/genes11121476.

Abstract

Genome sequencing has identified a large number of putative autism spectrum disorder (ASD) risk genes, revealing possible disrupted biological pathways; however, the genetic and environmental underpinnings of ASD remain mostly unanswered. The presented methodology aimed to identify genetically related clusters of ASD individuals. By using the VariCarta dataset, which contains data retrieved from 13,069 people with ASD, we compared patients pairwise to build "patient similarity matrices". Hierarchical-agglomerative-clustering and heatmapping were performed, followed by enrichment analysis (EA). We analyzed whole-genome sequencing retrieved from 2062 individuals, and isolated 11,609 genetic variants shared by at least two people. The analysis yielded three clusters, composed, respectively, by 574 (27.8%), 507 (24.6%), and 650 (31.5%) individuals. Overall, 4187 variants (36.1%) were common to the three clusters. The EA revealed that the biological processes related to the shared genetic variants were mainly involved in neuron projection guidance and morphogenesis, cell junctions, synapse assembly, and in observational, imitative, and vocal learning. The study highlighted genetic networks, which were more frequent in a sample of people with ASD, compared to the overall population. We suggest that itemizing not only single variants, but also gene networks, might support ASD etiopathology research. Future work on larger databases will have to ascertain the reproducibility of this methodology.

Keywords: autism spectrum disorder (ASD); cluster analysis; connectivity; gene networks; neurite morphogenesis; patient similarity analytics; synapse assembly.

MeSH terms

  • Autism Spectrum Disorder / genetics*
  • Autism Spectrum Disorder / metabolism
  • Autism Spectrum Disorder / physiopathology*
  • Cluster Analysis
  • Databases, Genetic
  • Female
  • Gene Regulatory Networks / genetics*
  • Genetic Predisposition to Disease / genetics
  • Humans
  • Male
  • Whole Genome Sequencing / methods