Distributed Function Mining for Gene Expression Programming Based on Fast Reduction

Song Deng; Dong Yue; Le-chan Yang; Xiong Fu; Ya-zhou Feng

doi:10.1371/journal.pone.0146698

Distributed Function Mining for Gene Expression Programming Based on Fast Reduction

PLoS One. 2016 Jan 11;11(1):e0146698. doi: 10.1371/journal.pone.0146698. eCollection 2016.

Authors

Song Deng¹, Dong Yue¹, Le-chan Yang², Xiong Fu³, Ya-zhou Feng¹

Affiliations

¹ Institute of Advanced Technology, Nanjing University Post & Telecommunication, Nanjing, 210023, China.
² International Institute for Earth System Science, Nanjing University, Nanjing, 210093, China.
³ School of Computer, Nanjing University Post & Telecommunication, Nanjing, 210023, China.

Abstract

For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computational Biology / methods*
Computer Simulation
Data Mining / methods*
Gene Expression Regulation*
Least-Squares Analysis
Models, Molecular
Models, Statistical
Reproducibility of Results

Grants and funding

This work was supported by the National Natural Science Foundation of P.R. China under grant Nos.51507084 and 61202354, NUPTSF under grant No. NY214203.