Causal Discovery on Discrete Data via Weighted Normalized Wasserstein Distance

IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):4911-4923. doi: 10.1109/TNNLS.2022.3213641. Epub 2024 Apr 4.

Abstract

The task of causal discovery from observational data (X,Y) is defined as the task of deciding whether X causes Y , or Y causes X or if there is no causal relationship between X and Y . Causal discovery from observational data is an important problem in many areas of science. In this study, we propose a method to address this problem when the cause-and-effect relationship is represented by a discrete additive noise model (ANM). First, assuming that X causes Y , we estimate the conditional distributions of the noise given X using regression. Similarly, assuming that Y causes X , we also estimate the conditional distributions of noise given Y . Based on the structural characteristics of the discrete ANM, we find that the dissimilarity of the conditional distributions of noise in the causal direction is smaller than that in the anticausal direction. Then, we propose a weighted normalized Wasserstein distance to measure the dissimilarity of the conditional distributions of noise. Finally, we propose a decision rule for casual discovery by comparing two computed weighted normalized Wasserstein distances. An empirical investigation demonstrates that our method performs well on synthetic data and outperforms state-of-the-art methods on real data.