Machine Learning Algorithms Applied to Predict Autism Spectrum Disorder Based on Gut Microbiome Composition

Biomedicines. 2023 Sep 26;11(10):2633. doi: 10.3390/biomedicines11102633.

Abstract

The application of machine learning (ML) techniques stands as a reliable method for aiding in the diagnosis of complex diseases. Recent studies have related the composition of the gut microbiota to the presence of autism spectrum disorder (ASD), but until now, the results have been mostly contradictory. This work proposes using machine learning to study the gut microbiome composition and its role in the early diagnosis of ASD. We applied support vector machines (SVMs), artificial neural networks (ANNs), and random forest (RF) algorithms to classify subjects as neurotypical (NT) or having ASD, using published data on gut microbiome composition. Naive Bayes, k-nearest neighbors, ensemble learning, logistic regression, linear regression, and decision trees were also trained and validated; however, the ones presented showed the best performance and interpretability. All the ML methods were developed using the SAS Viya software platform. The microbiome's composition was determined using 16S rRNA sequencing technology. The application of ML yielded a classification accuracy as high as 90%, with a sensitivity of 96.97% and specificity reaching 85.29%. In the case of the ANN model, no errors occurred when classifying NT subjects from the first dataset, indicating a significant classification outcome compared to traditional tests and data-based approaches. This approach was repeated with two datasets, one from the USA and the other from China, resulting in similar findings. The main predictors in the obtained models differ between the analyzed datasets. The most important predictors identified from the analyzed datasets are Bacteroides, Lachnospira, Anaerobutyricum, and Ruminococcus torques. Notably, among the predictors in each model, there is the presence of bacteria that are usually considered insignificant in the microbiome's composition due to their low relative abundance. This outcome reinforces the conventional understanding of the microbiome's influence on ASD development, where an imbalance in the composition of the microbiota can lead to disrupted host-microbiota homeostasis. Considering that several previous studies focused on the most abundant genera and neglected smaller (and frequently not statistically significant) microbial communities, the impact of such communities has been poorly analyzed. The ML-based models suggest that more research should focus on these less abundant microbes. A novel hypothesis explains the contradictory results in this field and advocates for more in-depth research to be conducted on variables that may not exhibit statistical significance. The obtained results seem to contribute to an explanation of the contradictory findings regarding ASD and its relation with gut microbiota composition. While some research correlates higher ratios of Bacillota/Bacteroidota, others find the opposite. These discrepancies are closely linked to the minority organisms in the microbiome's composition, which may differ between populations but share similar metabolic functions. Therefore, the ratios of Bacillota/Bacteroidota regarding ASD may not be determinants in the manifestation of ASD.

Keywords: ASD; artificial neural networks; autism; machine learning; microbiome; microbiota.

Grants and funding

This paper was prepared with the partial financial support of the Tecnológico de Monterrey, Institute of Advanced Materials for Sustainable Manufacturing, under the grant Challenge-Based Research Funding Program 2022 number I006-IAMSM004-C4-T2-T.