Generalized Co-Clustering Analysis via Regularized Alternating Least Squares

Comput Stat Data Anal. 2020 Oct:150:106989. doi: 10.1016/j.csda.2020.106989. Epub 2020 May 4.

Abstract

Biclustering is an important exploratory analysis tool that simultaneously clusters rows (e.g., samples) and columns (e.g., variables) of a data matrix. Checkerboard-like biclusters reveal intrinsic associations between rows and columns. However, most existing methods rely on Gaussian assumptions and only apply to matrix data. In practice, non-Gaussian and/or multi-way tensor data are frequently encountered. A new CO-clustering method via Regularized Alternating Least Squares (CORALS) is proposed, which generalizes biclustering to non-Gaussian data and multi-way tensor arrays. Non-Gaussian data are modeled with single-parameter exponential family distributions and co-clusters are identified in the natural parameter space via sparse CANDECOMP/PARAFAC tensor decomposition. A regularized alternating (iteratively reweighted) least squares algorithm is devised for model fitting and a deflation procedure is exploited to automatically determine the number of co-clusters. Comprehensive simulation studies and three real data examples demonstrate the efficacy of the proposed method. The data and code are publicly available.

Keywords: Biclustering; Exponential Family; Generalized Linear Model; Parafac/Candecomp; Tensor.