Identification of marker genes in Alzheimer's disease using a machine-learning model

Bioinformation. 2021 Feb 28;17(2):348-355. doi: 10.6026/97320630017348. eCollection 2021.

Abstract

Alzheimer's Disease (AD) is one of the most common causes of dementia, mostly affecting the elderly population. Currently, there is no proper diagnostic tool or method available for the detection of AD. The present study used two distinct data sets of AD genes, which could be potential biomarkers in the diagnosis. The differentially expressed genes (DEGs) curated from both datasets were used for machine learning classification, tissue expression annotation and co-expression analysis. Further, CNPY3, GPR84, HIST1H2AB, HIST1H2AE, IFNAR1, LMO3, MYO18A, N4BP2L1, PML, SLC4A4, ST8SIA4, TLE1 and N4BP2L1 were identified as highly significant DEGs and exhibited co-expression with other query genes. Moreover, a tissue expression study found that these genes are also expressed in the brain tissue. In addition to the earlier studies for marker gene identification, we have considered a different set of machine learning classifiers to improve the accuracy rate from the analysis. Amongst all the six classification algorithms, J48 emerged as the best classifier, which could be used for differentiating healthy and diseased samples. SMO/SVM and Logit Boost further followed J48 to achieve the classification accuracy.

Keywords: Alzheimer's Disease; Bayes Net; Biomarkers; Classifiers; Cross-validation; Decision Table; In-silico Analysis; J48; Log it Boost; Machine Learning; Naive Bayes; SMO/SVM.