Democratizing AI: non-expert design of prediction tasks

PeerJ Comput Sci. 2020 Sep 7:6:e296. doi: 10.7717/peerj-cs.296. eCollection 2020.

Abstract

Non-experts have long made important contributions to machine learning (ML) by contributing training data, and recent work has shown that non-experts can also help with feature engineering by suggesting novel predictive features. However, non-experts have only contributed features to prediction tasks already posed by experienced ML practitioners. Here we study how non-experts can design prediction tasks themselves, what types of tasks non-experts will design, and whether predictive models can be automatically trained on data sourced for their tasks. We use a crowdsourcing platform where non-experts design predictive tasks that are then categorized and ranked by the crowd. Crowdsourced data are collected for top-ranked tasks and predictive models are then trained and evaluated automatically using those data. We show that individuals without ML experience can collectively construct useful datasets and that predictive models can be learned on these datasets, but challenges remain. The prediction tasks designed by non-experts covered a broad range of domains, from politics and current events to health behavior, demographics, and more. Proper instructions are crucial for non-experts, so we also conducted a randomized trial to understand how different instructions may influence the types of prediction tasks being proposed. In general, understanding better how non-experts can contribute to ML can further leverage advances in Automatic machine learning and has important implications as ML continues to drive workplace automation.

Keywords: Amazon mechanical turk; AutoML; Automatic machine learning; Citizen science; Crowdsourcing; Interactive machine learning; Novel data collection; Predictive models; Randomized control trial; Supervised learning.

Associated data

  • figshare/10.6084/m9.figshare.9468512.v1

Grants and funding

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1447634 and by Google Open Source under the Open-Source Complex Ecosystems And Networks (OCEAN) project. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.