Online Causal Feature Selection for Streaming Features

IEEE Trans Neural Netw Learn Syst. 2023 Mar;34(3):1563-1577. doi: 10.1109/TNNLS.2021.3105585. Epub 2023 Feb 28.

Abstract

Recently, causal feature selection (CFS) has attracted considerable attention due to its outstanding interpretability and predictability performance. Such a method primarily includes the Markov blanket (MB) discovery and feature selection based on Granger causality. Representatively, the max-min MB (MMMB) can mine an optimal feature subset, i.e., MB; however, it is unsuitable for streaming features. Online streaming feature selection (OSFS) via online process streaming features can determine parents and children (PC), a subset of MB; however, it cannot mine the MB of the target attribute ( T ), i.e., a given feature, thus resulting in insufficient prediction accuracy. The Granger selection method (GSM) establishes a causal matrix of all features by performing excessively time; however, it cannot achieve a high prediction accuracy and only forecasts fixed multivariate time series data. To address these issues, we proposed an online CFS for streaming features (OCFSSFs) that mine MB containing PC and spouse and adopt the interleaving PC and spouse learning method. Furthermore, it distinguishes between PC and spouse in real time and can identify children with parents online when identifying spouses. We experimentally evaluated the proposed algorithm on synthetic datasets using precision, recall, and distance. In addition, the algorithm was tested on real-world and time series datasets using classification precision, the number of selected features, and running time. The results validated the effectiveness of the proposed algorithm.