V[Formula: see text]H: View Variation and View Heredity for Incomplete Multiview Clustering

Xiang Fang; Yuchong Hu; Pan Zhou; Dapeng Oliver Wu

doi:10.1109/TAI.2021.3052425

V[Formula: see text]H: View Variation and View Heredity for Incomplete Multiview Clustering

IEEE Trans Artif Intell. 2021 Jan 18;1(3):233-247. doi: 10.1109/TAI.2021.3052425. eCollection 2020 Dec.

Authors

Xiang Fang¹, Yuchong Hu¹, Pan Zhou², Dapeng Oliver Wu³

Affiliations

¹ School of Computer Science and TechnologyKey Laboratory of Information Storage System Ministry of Education of ChinaHuazhong University of Science and Technology Wuhan 430074 China.
² Hubei Engineering Research Center on Big Data SecuritySchool of Cyber Science and EngineeringHuazhong University of Science and Technology Wuhan 430074 China.
³ Department of Electrical and Computer EngineeringUniversity of Florida Gainesville FL 32611 USA.

Abstract

Real data often appear in the form of multiple incomplete views. Incomplete multiview clustering is an effective method to integrate these incomplete views. Previous methods only learn the consistent information between different views and ignore the unique information of each view, which limits their clustering performance and generalizations. To overcome this limitation, we propose a novel View Variation and View Heredity approach (V[Formula: see text]H). Inspired by the variation and the heredity in genetics, V[Formula: see text]H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively. Then, by aligning different views based on their cluster indicator matrices, V[Formula: see text]H integrates the unique information from different views to improve the clustering performance. Finally, with the help of the adjustable low-rank representation based on the heredity matrix, V[Formula: see text]H recovers the underlying true data structure to reduce the influence of the large incompleteness. More importantly, V[Formula: see text]H presents possibly the first work to introduce genetics to clustering algorithms for learning simultaneously the consistent information and the unique information from incomplete multiview data. Extensive experimental results on fifteen benchmark datasets validate its superiority over other state-of-the-arts. Impact Statement-Incomplete multiview clustering is a popular technology to cluster incomplete datasets from multiple sources. The technology is becoming more significant due to the absence of the expensive requirement of labeling these datasets. However, previous algorithms cannot fully learn the information of each view. Inspired by variation and heredity in genetics, our proposed algorithm V[Formula: see text]H fully learns the information of each view. Compared with the state-of-the-art algorithms, V[Formula: see text]H improves clustering performance by more than 20% in representative cases. With the large improvement on multiple datasets, V[Formula: see text]H has wide potential applications including the analysis of pandemic, financial and election datasets. The DOI of our codes is 10.24 433/CO.2 119 636.v1.

Keywords: Incomplete multiview clustering; view heredity; view variation.

Grants and funding

This work is supported by National Natural Science Foundation of China (NSFC) under Grant 61972448.