Identifying neurocognitive disorder using vector representation of free conversation

Toshiro Horigome; Kimihiro Hino; Hiroyoshi Toyoshiba; Norihisa Shindo; Kei Funaki; Yoko Eguchi; Momoko Kitazawa; Takanori Fujita; Masaru Mimura; Taishiro Kishimoto

doi:10.1038/s41598-022-16204-4

Identifying neurocognitive disorder using vector representation of free conversation

Sci Rep. 2022 Aug 3;12(1):12461. doi: 10.1038/s41598-022-16204-4.

Authors

Affiliations

¹ Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan.
² Lifescience AI Business Division, Research Development Department, FRONTEO Inc, Tokyo, Japan.
³ Department of Health Policy and Management, Keio University School of Medicine, Tokyo, Japan.
⁴ Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan. tkishimoto@keio.jp.
⁵ Hills Joint Research Laboratory for Future Preventive Medicine and Wellness, Keio University School of Medicine, 7th Floor, Roppongi Hills North Tower, 6-2-31 Roppongi, Minato-ku, Tokyo, 106-0032, Japan. tkishimoto@keio.jp.
⁶ Psychiatry at Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, USA. tkishimoto@keio.jp.

Abstract

In recent years, studies on the use of natural language processing (NLP) approaches to identify dementia have been reported. Most of these studies used picture description tasks or other similar tasks to encourage spontaneous speech, but the use of free conversation without requiring a task might be easier to perform in a clinical setting. Moreover, free conversation is unlikely to induce a learning effect. Therefore, the purpose of this study was to develop a machine learning model to discriminate subjects with and without dementia by extracting features from unstructured free conversation data using NLP. We recruited patients who visited a specialized outpatient clinic for dementia and healthy volunteers. Participants' conversation was transcribed and the text data was decomposed from natural sentences into morphemes by performing a morphological analysis using NLP, and then converted into real-valued vectors that were used as features for machine learning. A total of 432 datasets were used, and the resulting machine learning model classified the data for dementia and non-dementia subjects with an accuracy of 0.900, sensitivity of 0.881, and a specificity of 0.916. Using sentence vector information, it was possible to develop a machine-learning algorithm capable of discriminating dementia from non-dementia subjects with a high accuracy based on free conversation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Humans
Language
Machine Learning*
Natural Language Processing*
Neurocognitive Disorders