Exploring statistical and machine learning techniques to identify factors influencing indoor radon concentration

Sci Total Environ. 2023 Dec 20:905:167024. doi: 10.1016/j.scitotenv.2023.167024. Epub 2023 Sep 13.

Abstract

Radon is a radioactive gas with a carcinogenic effect. The malign effect on human health is, however, mostly influenced by the level of exposure. Dangerous exposure occurs predominantly indoors where the level of indoor radon concentration (IRC) is, in its turn, influenced by several factors. The current study aims to investigate the combined effects of geology, pedology, and house characteristics on the IRC based on 3132 passive radon measurements conducted in Romania. Several techniques for evaluating the impact of predictors on the dependent variable were used, from univariate statistics to artificial neural network and random forest regressor (RFR). The RFR model outperformed the other investigated models in terms of R2 (0.14) and RMSE (0.83) for the radon concentration, as a dependent continuous variable. Using IRC discretized into two classes, based on the median (115 Bq/m3), an AUC-ROC value of 0.61 was obtained for logistic regression and 0.62 for the random forest classifier. The presence of cellar beneath the investigated room, the construction period, the height above the sea level or the floor type are the main predictors determined by the models used.

Keywords: Building characteristics; Indoor radon; Lithology; Logistic regression; Pedology; Random forest.