Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model

Sensors (Basel). 2024 Feb 28;24(5):1560. doi: 10.3390/s24051560.

Abstract

With the advancement of engineering techniques, underground shield tunneling projects have also started incorporating emerging technologies to monitor the forces and displacements during the construction and operation phases of shield tunnels. Monitoring devices installed on the tunnel segment components generate a large amount of data. However, due to various factors, data may be missing. Hence, the completion of the incomplete data is imperative to ensure the utmost safety of the engineering project. In this research, a missing data imputation technique utilizing Random Forest (RF) is introduced. The optimal combination of the number of decision trees, maximum depth, and number of features in the RF is determined by minimizing the Mean Squared Error (MSE). Subsequently, complete soil pressure data are artificially manipulated to create incomplete datasets with missing rates of 20%, 40%, and 60%. A comparative analysis of the imputation results using three methods-median, mean, and RF-reveals that this proposed method has the smallest imputation error. As the missing rate increases, the mean squared error of the Random Forest method and the other two methods also increases, with a maximum difference of about 70%. This indicates that the random forest method is suitable for imputing monitoring data.

Keywords: imputation; missing data; random forest; shield tunnel; soil pressure.