Data analysis and modeling pipelines for controlled networked social science experiments

Vanessa Cedeno-Mieles; Zhihao Hu; Yihui Ren; Xinwei Deng; Noshir Contractor; Saliya Ekanayake; Joshua M Epstein; Brian J Goode; Gizem Korkmaz; Chris J Kuhlman; Dustin Machi; Michael Macy; Madhav V Marathe; Naren Ramakrishnan; Parang Saraf; Nathan Self

doi:10.1371/journal.pone.0242453

Data analysis and modeling pipelines for controlled networked social science experiments

PLoS One. 2020 Nov 24;15(11):e0242453. doi: 10.1371/journal.pone.0242453. eCollection 2020.

Authors

Vanessa Cedeno-Mieles^{1

2}, Zhihao Hu³, Yihui Ren⁴, Xinwei Deng³, Noshir Contractor⁵, Saliya Ekanayake⁶, Joshua M Epstein⁷, Brian J Goode⁸, Gizem Korkmaz⁹, Chris J Kuhlman⁹, Dustin Machi⁹, Michael Macy¹⁰, Madhav V Marathe^{9

11}, Naren Ramakrishnan^{1

12}, Parang Saraf¹², Nathan Self¹²

Affiliations

¹ Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America.
² Escuela Superior Politécnica del Litoral, ESPOL, Guayaquil, Ecuador.
³ Department of Statistics, Virginia Tech, Blacksburg, VA, United States of America.
⁴ Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States of America.
⁵ Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, United States of America.
⁶ Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America.
⁷ Department of Epidemiology, New York University, New York, NY, United States of America.
⁸ Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States of America.
⁹ Biocomplexity Institute & Initiative, University of Virginia, Charlottesville, VA, United States of America.
¹⁰ Department of Sociology, Cornell University, Ithaca, NY, United States of America.
¹¹ Department of Computer Science, University of Virginia, Charlottesville, VA, United States of America.
¹² Discovery Analytics Center, Virginia Tech, Blacksburg, VA, United States of America.

Abstract

There is large interest in networked social science experiments for understanding human behavior at-scale. Significant effort is required to perform data analytics on experimental outputs and for computational modeling of custom experiments. Moreover, experiments and modeling are often performed in a cycle, enabling iterative experimental refinement and data modeling to uncover interesting insights and to generate/refute hypotheses about social behaviors. The current practice for social analysts is to develop tailor-made computer programs and analytical scripts for experiments and modeling. This often leads to inefficiencies and duplication of effort. In this work, we propose a pipeline framework to take a significant step towards overcoming these challenges. Our contribution is to describe the design and implementation of a software system to automate many of the steps involved in analyzing social science experimental data, building models to capture the behavior of human subjects, and providing data to test hypotheses. The proposed pipeline framework consists of formal models, formal algorithms, and theoretical models as the basis for the design and implementation. We propose a formal data model, such that if an experiment can be described in terms of this model, then our pipeline software can be used to analyze data efficiently. The merits of the proposed pipeline framework is elaborated by several case studies of networked social science experiments.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Electronic Data Processing*
Humans
Models, Theoretical*
Social Behavior*
Social Sciences / methods*
Software*

Grants and funding

This work has been partially supported by DARPA Cooperative Agreement D17AC00003 (NGS2), DTRA CNIMS (Contract HDTRA1-11-D-0016- 0001), NSF DIBBS Grant ACI-1443054, NSF BIG DATA Grant IIS-1633028, NSF CRISP 2.0 Grant 1916670, NSF Grants DGE-1545362 and IIS-1633363, and ARL Grant W911NF-17-1-0021. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA, DTRA, NSF, ARL, or the U.S. Government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.