Early detection of Alzheimer's disease using single nucleotide polymorphisms analysis based on gradient boosting tree

Comput Biol Med. 2022 Jul:146:105622. doi: 10.1016/j.compbiomed.2022.105622. Epub 2022 May 24.

Abstract

Alzheimer's disease (AD) is a degenerative disorder that attacks nerve cells in the brain. AD leads to memory loss and cognitive & intellectual impairments that can influence social activities and decision-making. The most common type of human genetic variation is single nucleotide polymorphisms (SNPs). SNPs are beneficial markers of complex gene-disease. Many common and serious diseases, such as AD, have associated SNPs. Detection of SNP biomarkers linked with AD could help in the early prediction and diagnosis of this disease. The main objective of this paper is to predict and diagnose AD based on SNPs biomarkers with high classification accuracy in the early stages. One of the most concerning problems is the high number of features. Thus, the paper proposes a comprehensive framework for early AD detection and detecting the most significant genes based on SNPs analysis. Usage of machine learning (ML) techniques to identify new biomarkers of AD is also suggested. In the proposed system, two feature selection techniques are separately checked: the information gain filter and Boruta wrapper. The two feature selection techniques were used to select the most significant genes related to AD in this system. Filter methods measure the relevance of features by their correlation with dependent variables, while wrapper methods measure the usefulness of a subset of features by training a model on it. Gradient boosting tree (GBT) has been applied on all AD genetic data of neuroimaging initiative phase 1 (ADNI-1) and Whole-Genome Sequencing (WGS) datasets by using two feature selection techniques. In the whole-genome approach ADNI-1, results revealed that the GBT learning algorithm scored an overall accuracy of 99.06% in the case of using Boruta feature selection. Using information gain feature selection, the proposed system achieved an average accuracy of 94.87%. The results show that the proposed system is preferable for the early detection of AD. Also, the results revealed that the Boruta wrapper feature selection is superior to the information gain filter technique.

Keywords: AD; Alzheimer's disease; Boruta feature selection; Classification; Diagnosis; GBT; Gradient boosting tree; Information gain feature selection; Prediction; SNPs; Single nucleotide polymorphisms.

MeSH terms

  • Alzheimer Disease* / diagnosis
  • Alzheimer Disease* / genetics
  • Biomarkers
  • Brain
  • Cognitive Dysfunction*
  • Humans
  • Magnetic Resonance Imaging / methods
  • Polymorphism, Single Nucleotide
  • Trees

Substances

  • Biomarkers