Beyond belief: a cross-genre study on perception and validation of health information online

Int J Data Sci Anal. 2022;13(4):299-314. doi: 10.1007/s41060-022-00310-7. Epub 2022 Feb 2.

Abstract

Natural language undergoes significant transformation from the domain of specialized research to general news intended for wider consumption. This transition makes the information vulnerable to misinterpretation, misrepresentation, and incorrect attribution, all of which may be difficult to identify without adequate domain knowledge and may exist even in the presence of explicit citations. Moreover, newswire articles seldom provide a precise correspondence between a specific claim and its origin, making it harder to identify which claims, if any, reflect the original findings. For instance, an article stating "Flagellin shows therapeutic potential with H3N2, known as Aussie Flu." contains two claims ("Flagellin ... H3N2," and "H3N2, known as Aussie Flu") that may be true or false independent of each other, and it is prima facie unclear which claims, if any, are supported by the cited research. We build a dataset of sentences from medical news along with the sources from peer-reviewed medical research journals they cite. We use these data to study what a general reader perceives to be true, and how to verify the scientific source of claims. Unlike existing datasets, this captures the metamorphosis of information across two genres with disparate readership and vastly different vocabularies and presents the first empirical study of health-related fact-checking across them.

Keywords: Check-worthiness; Claim extraction; Cross-genre information retrieval; Fact-checking; Misinformation; Natural language processing.