Ensemble stacking rockburst prediction model based on Yeo-Johnson, K-means SMOTE, and optimal rockburst feature dimension determination

Sci Rep. 2022 Sep 12;12(1):15352. doi: 10.1038/s41598-022-19669-5.

Abstract

Rockburst forecasting plays a crucial role in prevention and control of rockburst disaster. To improve the accuracy of rockburst prediction at the data structure and algorithm levels, the Yeo-Johnson transform, K-means SMOTE oversampling, and optimal rockburst feature dimension determination are used to optimize the data structure. At the algorithm optimization level, ensemble stacking rockburst prediction is performed based on the data structure optimization. First, to solve the problem of many outliers and data imbalance in the distribution of rockburst data, the Yeo-Johnson transform and k-means SMOTE algorithm are respectively used to solve the problems. Then, based on six original rockburst features, 21 new features are generated using the PolynomialFeatures function in Sklearn. Principal component analysis (PCA) dimensionality reduction is applied to eliminate the correlations between the 27 features. Thirteen types of machine learning algorithms are used to predict datasets that retain different numbers of features after dimensionality reduction to determine the optimal rockburst feature dimension. Finally, the 14-feature rockburst dataset is used as the input for integrated stacking. The results show that the ensemble stacking model based on Yeo-Johnson, K-means SMOTE, and optimal rockburst feature dimension determination can improve the accuracy of rockburst prediction by 0.1602-0.3636. Compared with the 13 single machine learning models without data preprocessing, this data structure optimization and algorithm optimization method effectively improves the accuracy of rockburst prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Forecasting
  • Machine Learning*