CM-GAN: A Cross-Modal Generative Adversarial Network for Imputing Completely Missing Data in Digital Industry

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):2917-2926. doi: 10.1109/TNNLS.2023.3284666. Epub 2024 Feb 29.

Abstract

Multimodal data fusion analysis is essential to model the uncertainty of environment awareness in digital industry. However, due to communication failure and cyberattack, the sampled time-series data often have the issue of data missing. In some extreme cases, part of units are unobservable for a long time, which results in complete data missing (CDM). To impute missing data, many models have been proposed. However, they cannot address the CDM issue, because no observation data of the unobservable units can be obtained in this case. Thus, to address the CDM issue, a novel cross-modal generative adversarial network (CM-GAN) is proposed in this article. It combines the cross-modal data fusion technique and the deep adversarial generation technique to construct a cross-modal data generator. This generator can generate long-term time-series data from widely existing spatio-temporal modal data in modern industrial system, and then impute missing value by replacing them with generated data. To test the performance of CM-GAN, extensive experiments are conducted on photovoltaic (PV) power output dataset. Compared with other baseline models, the performance of CM-GAN is generally better and reaches the state-of-the-art level. Moreover, sufficient ablation studies are conducted to present the contribution of the cross-modal data fusion technique and show the reasonability of parameter settings of CM-GAN. Apart from this, some prediction experiments are also conducted. The results show that the PV data recovered by CM-GAN can provide more predictability information for improving the prediction accuracy of deep learning model.