Using a cognitive model to understand crowdsourced data from citizen scientists

Behav Res Methods. 2023 Nov 29. doi: 10.3758/s13428-023-02289-w. Online ahead of print.

Abstract

Threatened species monitoring can produce enormous quantities of acoustic and visual recordings which must be searched for animal detections. Data coding is extremely time-consuming for humans and even though machine algorithms are emerging as useful tools to tackle this task, they too require large amounts of known detections for training. Citizen scientists are often recruited via crowd-sourcing to assist. However, the results of their coding can be difficult to interpret because citizen scientists lack comprehensive training and typically each codes only a small fraction of the full dataset. Competence may vary between citizen scientists, but without knowing the ground truth of the dataset, it is difficult to identify which citizen scientists are most competent. We used a quantitative cognitive model, cultural consensus theory, to analyze both empirical and simulated data from a crowdsourced analysis of audio recordings of Australian frogs. Several hundred citizen scientists were asked whether the calls of nine frog species were present on 1260 brief audio recordings, though most only coded a fraction of these recordings. Through modeling, characteristics of both the citizen scientist cohort and the recordings were estimated. We then compared the model's output to expert coding of the recordings and found agreement between the cohort's consensus and the expert evaluation. This finding adds to the evidence that crowdsourced analyses can be utilized to understand large-scale datasets, even when the ground truth of the dataset is unknown. The model-based analysis provides a promising tool to screen large datasets prior to investing expert time and resources.

Keywords: Citizen science; Crowdsourcing; Data aggregation; Data analysis.