Combining BERT Model with Semi-Supervised Incremental Learning for Heterogeneous Knowledge Fusion of High-Speed Railway On-Board System

Comput Intell Neurosci. 2022 May 31:2022:9948218. doi: 10.1155/2022/9948218. eCollection 2022.

Abstract

On-board system fault knowledge base (KB) is a collection of fault causes, maintenance methods, and interrelationships among on-board modules and components of high-speed railways, which plays a crucial role in knowledge-driven dynamic operation and maintenance (O&M) decisions for on-board systems. To solve the problem of multi-source heterogeneity of on-board system O&M data, an entity matching (EM) approach using the BERT model and semi-supervised incremental learning is proposed. The heterogeneous knowledge fusion task is formulated as a pairwise binary classification task of entities in the knowledge units. Firstly, the deep semantic features of fault knowledge units are obtained by BERT. We also investigate the effectiveness of knowledge unit features extracted from different hidden layers of the model on heterogeneous knowledge fusion during model fine-tuning. To further improve the utilization of unlabeled test samples, a semi-supervised incremental learning strategy based on pseudo labels is devised. By selecting entity pairs with high confidence to generate pseudo labels, the label sample set is expanded to realize incremental learning and enhance the knowledge fusion ability of the model. Furthermore, the model's robustness is strengthened by embedding-based adversarial training in the fine-tuning stage. Based on the on-board system's O&M data, this paper constructs the fault KB and compares the model with other solutions developed for related matching tasks, which verifies the effectiveness of this model in the heterogeneous knowledge fusion task of the on-board system.