Estimating the outcome of spreading processes on networks with incomplete information: A dimensionality reduction approach

Anna Sapienza; Alain Barrat; Ciro Cattuto; Laetitia Gauvin

doi:10.1103/PhysRevE.98.012317

Estimating the outcome of spreading processes on networks with incomplete information: A dimensionality reduction approach

Phys Rev E. 2018 Jul;98(1-1):012317. doi: 10.1103/PhysRevE.98.012317.

Authors

Anna Sapienza¹, Alain Barrat², Ciro Cattuto³, Laetitia Gauvin³

Affiliations

¹ Information Sciences Institute, Viterbi School of Engineering, University of Southern California, Marina del Rey, California 90292, USA and Data Science Laboratory, ISI Foundation, 10126 Turin, Italy.
² Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France and Data Science Laboratory, ISI Foundation, Turin, Italy.
³ Data Science Laboratory, ISI Foundation, 10126 Turin, Italy.

PMID: 30110805
DOI: 10.1103/PhysRevE.98.012317

Abstract

Recent advances in data collection have facilitated the access to time-resolved human proximity data that can conveniently be represented as temporal networks of contacts between individuals. While the structural and dynamical information revealed by this type of data is fundamental to investigate how information or diseases propagate in a population, data often suffer from incompleteness, which possibly leads to biased estimations in data-driven models. A major challenge is thus to estimate the outcome of spreading processes occurring on temporal networks built from partial information. To cope with this problem, we devise an approach based on non-negative tensor factorization, a dimensionality reduction technique from multilinear algebra. The key idea is to learn a low-dimensional representation of the temporal network built from partial information and to use it to construct a surrogate network similar to the complete original network. To test our method, we consider several human-proximity networks, on which we perform resampling experiments to simulate a loss of data. Using our approach on the resulting partial networks, we build a surrogate version of the complete network for each. We then compare the outcome of a spreading process on the complete networks (nonaltered by a loss of data) and on the surrogate networks. We observe that the epidemic sizes obtained using the surrogate networks are in good agreement with those measured on the complete networks. Finally, we propose an extension of our framework that can leverage additional data, when available, to improve the surrogate network when the data loss is particularly large.