A Crowdsourcing Framework for Medical Data Sets

Cheng Ye; Joseph Coco; Anna Epishova; Chen Hajaj; Henry Bogardus; Laurie Novak; Joshua Denny; Yevgeniy Vorobeychik; Thomas Lasko; Bradley Malin; Daniel Fabbri

A Crowdsourcing Framework for Medical Data Sets

AMIA Jt Summits Transl Sci Proc. 2018 May 18:2017:273-280. eCollection 2018.

Authors

Affiliation

¹ Vanderbilt University, Nashville, TN, USA.

PMID: 29888085
PMCID: PMC5961774

Abstract

Crowdsourcing services like Amazon Mechanical Turk allow researchers to ask questions to crowds of workers and quickly receive high quality labeled responses. However, crowds drawn from the general public are not suitable for labeling sensitive and complex data sets, such as medical records, due to various concerns. Major challenges in building and deploying a crowdsourcing system for medical data include, but are not limited to: managing access rights to sensitive data and ensuring data privacy controls are enforced; identifying workers with the necessary expertise to analyze complex information; and efficiently retrieving relevant information in massive data sets. In this paper, we introduce a crowdsourcing framework to support the annotation of medical data sets. We further demonstrate a workflow for crowdsourcing clinical chart reviews including (1) the design and decomposition of research questions; (2) the architecture for storing and displaying sensitive data; and (3) the development of tools to support crowd workers in quickly analyzing information from complex data sets.

Grants and funding

UH2 CA203708/CA/NCI NIH HHS/United States