Using a cognitive model to understand crowdsourced data from citizen scientists

Alex Thorpe; Oliver Kelly; Alex Callen; Andrea S Griffin; Scott D Brown

doi:10.3758/s13428-023-02289-w

Using a cognitive model to understand crowdsourced data from citizen scientists

Behav Res Methods. 2023 Nov 29. doi: 10.3758/s13428-023-02289-w. Online ahead of print.

Authors

Alex Thorpe¹, Oliver Kelly², Alex Callen², Andrea S Griffin², Scott D Brown³

Affiliations

¹ School of Psychological Sciences, University of Newcastle, Callaghan, Australia.
² School of Environmental and Life Sciences, University of Newcastle, Callaghan, Australia.
³ School of Psychological Sciences, University of Newcastle, Callaghan, Australia. scott.brown@newcastle.edu.au.

PMID: 38030927
DOI: 10.3758/s13428-023-02289-w

Abstract

Threatened species monitoring can produce enormous quantities of acoustic and visual recordings which must be searched for animal detections. Data coding is extremely time-consuming for humans and even though machine algorithms are emerging as useful tools to tackle this task, they too require large amounts of known detections for training. Citizen scientists are often recruited via crowd-sourcing to assist. However, the results of their coding can be difficult to interpret because citizen scientists lack comprehensive training and typically each codes only a small fraction of the full dataset. Competence may vary between citizen scientists, but without knowing the ground truth of the dataset, it is difficult to identify which citizen scientists are most competent. We used a quantitative cognitive model, cultural consensus theory, to analyze both empirical and simulated data from a crowdsourced analysis of audio recordings of Australian frogs. Several hundred citizen scientists were asked whether the calls of nine frog species were present on 1260 brief audio recordings, though most only coded a fraction of these recordings. Through modeling, characteristics of both the citizen scientist cohort and the recordings were estimated. We then compared the model's output to expert coding of the recordings and found agreement between the cohort's consensus and the expert evaluation. This finding adds to the evidence that crowdsourced analyses can be utilized to understand large-scale datasets, even when the ground truth of the dataset is unknown. The model-based analysis provides a promising tool to screen large datasets prior to investing expert time and resources.

Keywords: Citizen science; Crowdsourcing; Data aggregation; Data analysis.

Grants and funding

G1800970/NSW Environmental Trust