Machine learning algorithm-based estimation model for the severity of depression assessed using Montgomery-Asberg depression rating scale

Masanori Shimamoto; Kanako Ishizuka; Kento Ohtani; Toshiya Inada; Maeri Yamamoto; Masako Tachibana; Hiroki Kimura; Yusuke Sakai; Kazuhiro Kobayashi; Norio Ozaki; Masashi Ikeda

doi:10.1002/npr2.12404

Machine learning algorithm-based estimation model for the severity of depression assessed using Montgomery-Asberg depression rating scale

Neuropsychopharmacol Rep. 2024 Mar;44(1):115-120. doi: 10.1002/npr2.12404. Epub 2023 Dec 20.

Authors

Affiliations

¹ Department of Psychiatry, Nagoya University Graduate School of Medicine, Nagoya, Japan.
² Health Support Center, Nagoya Institute of Technology, Nagoya, Japan.
³ Department of Intelligent Systems, Nagoya University Graduate School of Informatics, Nagoya, Japan.
⁴ Department of Psychiatry, Nagoya University Hospital, Nagoya, Japan.
⁵ Human Dataware Lab., Co., Ltd., Nagoya, Japan.
⁶ Pathophysiology of Mental Disorders, Nagoya University Graduate School of Medicine, Nagoya, Japan.

Abstract

Aim: Depressive disorder is often evaluated using established rating scales. However, consistent data collection with these scales requires trained professionals. In the present study, the "rater & estimation-system" reliability was assessed between consensus evaluation by trained psychiatrists and the estimation by 2 models of the AI-MADRS (Montgomery-Asberg Depression Rating Scale) estimation system, a machine learning algorithm-based model developed to assess the severity of depression.

Methods: During interviews with trained psychiatrists and the AI-MADRS estimation system, patients responded orally to machine-generated voice prompts from the AI-MADRS structured interview questions. The severity scores estimated from two models of the AI-MADRS estimation system, the max estimation model and the average estimation model, were compared with those by trained psychiatrists.

Results: A total of 51 evaluation interviews conducted on 30 patients were analyzed. Pearson's correlation coefficient with the scores evaluated by trained psychiatrists was 0.76 (95% confidence interval 0.62-0.86) for the max estimation model, and 0.86 (0.76-0.92) for the average estimation model. The ANOVA ICC rater & estimation-system reliability with the evaluation scores by trained psychiatrists was 0.51 (-0.09 to 0.79) for the max estimation model, and 0.75 (0.55-0.86) for the average estimation model.

Conclusion: The average estimation model of AI-MADRS demonstrated substantially acceptable rater & estimation-system reliability with trained psychiatrists. Accumulating a broader training dataset and the refinement of AI-MADRS interviews are expected to improve the performance of AI-MADRS. Our findings suggest that AI technologies can significantly modernize and potentially revolutionize the realm of depression assessments.

Keywords: MADRS; depression; natural language processing; rating scale; speech recognition.

MeSH terms

Depression*
Humans
Reproducibility of Results

Abstract

MeSH terms

Grants and funding