A hybrid data mining model for diagnosis of patients with clinical suspicion of dementia

Comput Methods Programs Biomed. 2018 Oct:165:139-149. doi: 10.1016/j.cmpb.2018.08.016. Epub 2018 Aug 24.

Abstract

Background and objective: Given the phenomenon of aging population, dementias arise as a complex health problem throughout the world. Several methods of machine learning have been applied to the task of predicting dementias. Given its diagnostic complexity, the great challenge lies in distinguishing patients with some type of dementia from healthy people. Particularly in the early stages, the diagnosis positively impacts the quality of life of both the patient and the family. This work presents a hybrid data mining model, involving the mining of texts integrated to the mining of structured data. This model aims to assist specialists in the diagnosis of patients with clinical suspicion of dementia.

Methods: The experiments were conducted from a set of 605 medical records with 19 different attributes about patients with cognitive decline reports. Firstly, a new structured attribute was created from a text mining process. It was the result of clustering the patient's pathological history information stored in an unstructured textual attribute. Classification algorithms (naïve bayes, bayesian belief networks and decision trees) were applied to obtain Alzheimer's disease and mild cognitive impairment predictive models. Ensemble methods (Bagging, Boosting and Random Forests) were used in order to improve the accuracy of the generated models. These methods were applied in two datasets: one containing only the original structured data; the other containing the original structured data with the inclusion of the new attribute resulting from the text mining (hybrid model).

Results: The models' accuracy metrics obtained from the two different datasets were compared. The results evidenced the greater effectiveness of the hybrid model in the diagnostic prediction for the pathologies of interest.

Conclusions: When analysing the different methods of classification and clustering used, the better rates related to the precision and sensitivity of the pathologies under study were obtained with hybrid models with support of ensemble methods.

Keywords: Alzheimer's disease; Data mining; Medical diagnosis; Mild cognitive impairment; Text mining.

Publication types

  • Evaluation Study

MeSH terms

  • Aged
  • Algorithms
  • Alzheimer Disease / classification
  • Alzheimer Disease / diagnosis
  • Alzheimer Disease / psychology
  • Bayes Theorem
  • Cognitive Dysfunction / classification
  • Cognitive Dysfunction / diagnosis
  • Cognitive Dysfunction / psychology
  • Data Mining / methods*
  • Data Mining / statistics & numerical data
  • Decision Support Systems, Clinical / statistics & numerical data
  • Decision Trees
  • Dementia / classification
  • Dementia / diagnosis*
  • Dementia / psychology
  • Diagnosis, Computer-Assisted / methods*
  • Diagnosis, Computer-Assisted / statistics & numerical data
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Models, Statistical