Distributed Function Mining for Gene Expression Programming Based on Fast Reduction

PLoS One. 2016 Jan 11;11(1):e0146698. doi: 10.1371/journal.pone.0146698. eCollection 2016.

Abstract

For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation
  • Data Mining / methods*
  • Gene Expression Regulation*
  • Least-Squares Analysis
  • Models, Molecular
  • Models, Statistical
  • Reproducibility of Results

Grants and funding

This work was supported by the National Natural Science Foundation of P.R. China under grant Nos.51507084 and 61202354, NUPTSF under grant No. NY214203.