Semantic coherence markers: The contribution of perplexity metrics

Davide Colla; Matteo Delsanto; Marco Agosto; Benedetto Vitiello; Daniele P Radicioni

doi:10.1016/j.artmed.2022.102393

Semantic coherence markers: The contribution of perplexity metrics

Artif Intell Med. 2022 Dec:134:102393. doi: 10.1016/j.artmed.2022.102393. Epub 2022 Sep 5.

Authors

Davide Colla¹, Matteo Delsanto¹, Marco Agosto², Benedetto Vitiello², Daniele P Radicioni³

Affiliations

¹ University of Turin, Computer Science Department, Italy.
² University of Turin, Department of Sciences of Public Health and Pediatrics, Italy.
³ University of Turin, Computer Science Department, Italy. Electronic address: daniele.radicioni@unito.it.

PMID: 36462890
DOI: 10.1016/j.artmed.2022.102393

Abstract

Devising automatic tools to assist specialists in the early detection of mental disturbances and psychotic disorders is to date a challenging scientific problem and a practically relevant activity. In this work we explore how language models (that are probability distributions over text sequences) can be employed to analyze language and discriminate between mentally impaired and healthy subjects. We have preliminarily explored whether perplexity can be considered a reliable metrics to characterize an individual's language. Perplexity was originally conceived as an information-theoretic measure to assess how much a given language model is suited to predict a text sequence or, equivalently, how much a word sequence fits into a specific language model. We carried out an extensive experimentation with healthy subjects, and employed language models as diverse as N-grams - from 2-grams to 5-grams - and GPT-2, a transformer-based language model. Our experiments show that irrespective of the complexity of the employed language model, perplexity scores are stable and sufficiently consistent for analyzing the language of individual subjects, and at the same time sensitive enough to capture differences due to linguistic registers adopted by the same speaker, e.g., in interviews and political rallies. A second array of experiments was designed to investigate whether perplexity scores may be used to discriminate between the transcripts of healthy subjects and subjects suffering from Alzheimer Disease (AD). Our best performing models achieved full accuracy and F-score (1.00 in both precision/specificity and recall/sensitivity) in categorizing subjects from both the AD class, and control subjects. These results suggest that perplexity can be a valuable analytical metrics with potential application to supporting early diagnosis of symptoms of mental disorders.

Keywords: Automatic language analysis; Diagnosis of dementia; Early diagnosis; Language models; Mental and cognitive disorders; Perplexity.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Alzheimer Disease* / diagnosis
Benchmarking
Biomarkers
Humans
Linguistics
Semantics*

Substances

Biomarkers