Cost-Efficient and Multi-Functional Secure Aggregation in Large Scale Distributed Application

PLoS One. 2016 Aug 23;11(8):e0159605. doi: 10.1371/journal.pone.0159605. eCollection 2016.

Abstract

Secure aggregation is an essential component of modern distributed applications and data mining platforms. Aggregated statistical results are typically adopted in constructing a data cube for data analysis at multiple abstraction levels in data warehouse platforms. Generating different types of statistical results efficiently at the same time (or referred to as enabling multi-functional support) is a fundamental requirement in practice. However, most of the existing schemes support a very limited number of statistics. Securely obtaining typical statistical results simultaneously in the distribution system, without recovering the original data, is still an open problem. In this paper, we present SEDAR, which is a SEcure Data Aggregation scheme under the Range segmentation model. Range segmentation model is proposed to reduce the communication cost by capturing the data characteristics, and different range uses different aggregation strategy. For raw data in the dominant range, SEDAR encodes them into well defined vectors to provide value-preservation and order-preservation, and thus provides the basis for multi-functional aggregation. A homomorphic encryption scheme is used to achieve data privacy. We also present two enhanced versions. The first one is a Random based SEDAR (REDAR), and the second is a Compression based SEDAR (CEDAR). Both of them can significantly reduce communication cost with the trade-off lower security and lower accuracy, respectively. Experimental evaluations, based on six different scenes of real data, show that all of them have an excellent performance on cost and accuracy.

MeSH terms

  • Algorithms
  • Computer Communication Networks / statistics & numerical data*
  • Costs and Cost Analysis
  • Data Collection / statistics & numerical data*
  • Data Compression
  • Data Mining / statistics & numerical data*
  • Humans
  • Models, Theoretical*

Grants and funding

This work is supported by the Foundation of Hunan Educational Committee (14C0484), National Natural Science Foundation of China under Grant (61502054), Yongzhou Science and Technology Plan ([2013]3), Open Research Fund of Hunan Provincial Key Laboratory of Network Investigational Technology (2016WLZC016), and Foundation of Hunan University of Science and Engineering (13XKYTA003).