Entropy Minimizing Matrix Factorization

IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):9209-9222. doi: 10.1109/TNNLS.2022.3157148. Epub 2023 Oct 27.

Abstract

Nonnegative matrix factorization (NMF) is a widely used data analysis technique and has yielded impressive results in many real-world tasks. Generally, existing NMF methods represent each sample with several centroids and find the optimal centroids by minimizing the sum of the residual errors. However, outliers deviating from the normal data distribution may have large residues and then dominate the objective value. In this study, an entropy minimizing matrix factorization (EMMF) framework is developed to tackle the above problem. Considering that outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization, which minimizes the entropy of the residue distribution and allows a few samples to have large errors. In this way, the outliers do not affect the approximation of normal samples. Multiplicative updating rules for EMMF are derived, and the convergence is proven theoretically. In addition, a Graph regularized version of EMMF (G-EMMF) is also presented, which uses a data graph to capture the data relationship. Clustering results on various synthetic and real-world datasets demonstrate the advantages of the proposed models, and the effectiveness is also verified through the comparison with state-of-the-art methods.