Aggregating pairwise semantic differences for few-shot claim verification

Xia Zeng; Arkaitz Zubiaga

doi:10.7717/peerj-cs.1137

Aggregating pairwise semantic differences for few-shot claim verification

PeerJ Comput Sci. 2022 Oct 25:8:e1137. doi: 10.7717/peerj-cs.1137. eCollection 2022.

Authors

Xia Zeng¹, Arkaitz Zubiaga¹

Affiliation

¹ Queen Mary University of London, London, United Kingdom.

Abstract

As part of an automated fact-checking pipeline, the claim verification task consists in determining if a claim is supported by an associated piece of evidence. The complexity of gathering labelled claim-evidence pairs leads to a scarcity of datasets, particularly when dealing with new domains. In this article, we introduce Semantic Embedding Element-wise Difference (SEED), a novel vector-based method to few-shot claim verification that aggregates pairwise semantic differences for claim-evidence pairs. We build on the hypothesis that we can simulate class representative vectors that capture average semantic differences for claim-evidence pairs in a class, which can then be used for classification of new instances. We compare the performance of our method with competitive baselines including fine-tuned Bidirectional Encoder Representations from Transformers (BERT)/Robustly Optimized BERT Pre-training Approach (RoBERTa) models, as well as the state-of-the-art few-shot claim verification method that leverages language model perplexity. Experiments conducted on the Fact Extraction and VERification (FEVER) and SCIFACT datasets show consistent improvements over competitive baselines in few-shot settings. Our code is available.

Keywords: Automated fact-checking; Claim validation; Claim verification; Few-shot classification; Misinformation detection; Natural language processing; Veracity classification.

Grants and funding

This work was supported by the Engineering and Physical Sciences Research Council (Grant EP/V048597/1). Xia Zeng is funded by the China Scholarship Council (CSC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.