Predictive modeling for trustworthiness and other subjective text properties in online nutrition and health communication

PLoS One. 2020 Aug 6;15(8):e0237144. doi: 10.1371/journal.pone.0237144. eCollection 2020.

Abstract

While the internet has democratized and accelerated content creation and sharing, it has also made people more vulnerable to manipulation and misinformation. Also, the received information can be distorted by psychological biases. This is problematic especially in health-related communications which can greatly affect the quality of life of individuals. We assembled and analyzed 364 texts related to nutrition and health from Finnish online sources, such as news, columns and blogs, and asked non-experts to subjectively evaluate the texts. Texts were rated for their trustworthiness, sentiment, logic, information, clarity, and neutrality properties. We then estimated individual biases and consensus ratings that were used in training regression models. Firstly, we found that trustworthiness was significantly correlated to the information, neutrality and logic of the texts. Secondly, individual ratings for information and logic were significantly biased by the age and diet of the raters. Our best regression models explained up to 70% of the total variance of consensus ratings based on the low-level properties of texts, such as semantic embeddings, presence of key-terms and part-of-speech tags, references, quotes and paragraphs. With a novel combination of crowdsourcing, behavioral analysis, natural language processing and predictive modeling, our study contributes to the automated identification of reliable and high-quality online information. While critical evaluation of truthfulness cannot be surrendered to the machine only, our findings provide new insights into automated evaluation of subjective text properties and analysis of morphologically-rich languages in regards to trustworthiness.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Communication*
  • Consumer Health Informatics / standards*
  • Consumer Health Informatics / statistics & numerical data
  • Consumer Health Information / standards*
  • Consumer Health Information / statistics & numerical data
  • Diet*
  • Healthy Lifestyle*
  • Humans
  • Internet
  • Models, Statistical
  • Trust*

Grants and funding

This work is part of the “Confidence AI” project funded by Helsingin Sanomat Foundation (https://www.hssaatio.fi) through “The post-truth era” research program (JK,JH,JS). The work was supported by the Estonian Research Council (https://www.etag.ee) via grant MOBTT90 (PT).