Language patterns in Japanese patients with Alzheimer disease: A machine learning approach

Psychiatry Clin Neurosci. 2023 May;77(5):273-281. doi: 10.1111/pcn.13526. Epub 2023 Feb 8.

Abstract

Aim: The authors applied natural language processing and machine learning to explore the disease-related language patterns that warrant objective measures for assessing language ability in Japanese patients with Alzheimer disease (AD), while most previous studies have used large publicly available data sets in Euro-American languages.

Methods: The authors obtained 276 speech samples from 42 patients with AD and 52 healthy controls, aged 50 years or older. A natural language processing library for Python was used, spaCy, with an add-on library, GiNZA, which is a Japanese parser based on Universal Dependencies designed to facilitate multilingual parser development. The authors used eXtreme Gradient Boosting for our classification algorithm. Each unit of part-of-speech and dependency was tagged and counted to create features such as tag-frequency and tag-to-tag transition-frequency. Each feature's importance was computed during the 100-fold repeated random subsampling validation and averaged.

Results: The model resulted in an accuracy of 0.84 (SD = 0.06), and an area under the curve of 0.90 (SD = 0.03). Among the features that were important for such predictions, seven of the top 10 features were related to part-of-speech, while the remaining three were related to dependency. A box plot analysis demonstrated that the appearance rates of content words-related features were lower among the patients, whereas those with stagnation-related features were higher.

Conclusion: The current study demonstrated a promising level of accuracy for predicting AD and found the language patterns corresponding to the type of lexical-semantic decline known as 'empty speech', which is regarded as a characteristic of AD.

Keywords: Alzheimer disease; dementia; machine learning; natural language processing; speech-language pathology.

MeSH terms

  • Alzheimer Disease*
  • East Asian People
  • Humans
  • Language
  • Language Disorders* / etiology
  • Machine Learning
  • Middle Aged
  • Speech