Validation of causal inference data using DirectLiNGAM in an environmental small-scale model and calculation settings

MethodsX. 2023 Dec 20:12:102528. doi: 10.1016/j.mex.2023.102528. eCollection 2024 Jun.

Abstract

The development of data science has been needed in environmental fields such as marine, weather, and soil data. In general, the datasets are large in some cases, but they are often small because they contain observation data that the analyses themselves are limited. In such a case, the data are statistically evaluated by increasing or decreasing the levels of factors using differential analysis, resulting in the essential factors are estimated. However, there is no consistent approach to the means of assessing strong associations as a group between factors. Causal inference method has the possibility to output effective results for small data, and the results are expected to provide important information for understanding the potential highly association between factors, not necessarily the inference with big data. Here, we describe essential checkpoints and settings for the calculation by a direct method for learning a linear non-Gaussian structural equation model (DirectLiNGAM) and validation methods for the calculation results by using DirectLiNGAM with small-scale model data as an additional discussion of DirectLiNGAM portion of the related research article. Thus, this study provides the statistical validation methods for the association networks, treatments, and interventions for structural inference as a group of essential factors.•Causal inference with DirectLiNGAM•Validation of correlation coefficient and feature importance•Validation using causal effect object and propensity scores.

Keywords: Causal effect; Correlation map; DirectLiNGAM: A causal inference by direct estimation approach for learning the basic LiNGAM model with non-Gaussian data; Feature importance; Field data; LiNGAM; Treatment effect.