Generalized Co-Clustering Analysis via Regularized Alternating Least Squares.
Ontology highlight
ABSTRACT: Biclustering is an important exploratory analysis tool that simultaneously clusters rows (e.g., samples) and columns (e.g., variables) of a data matrix. Checkerboard-like biclusters reveal intrinsic associations between rows and columns. However, most existing methods rely on Gaussian assumptions and only apply to matrix data. In practice, non-Gaussian and/or multi-way tensor data are frequently encountered. A new CO-clustering method via Regularized Alternating Least Squares (CORALS) is proposed, which generalizes biclustering to non-Gaussian data and multi-way tensor arrays. Non-Gaussian data are modeled with single-parameter exponential family distributions and co-clusters are identified in the natural parameter space via sparse CANDECOMP/PARAFAC tensor decomposition. A regularized alternating (iteratively reweighted) least squares algorithm is devised for model fitting and a deflation procedure is exploited to automatically determine the number of co-clusters. Comprehensive simulation studies and three real data examples demonstrate the efficacy of the proposed method. The data and code are publicly available.
SUBMITTER: Li G
PROVIDER: S-EPMC7297185 | biostudies-literature |
REPOSITORIES: biostudies-literature
ACCESS DATA