Unknown

Dataset Information

0

Generalized Co-Clustering Analysis via Regularized Alternating Least Squares.


ABSTRACT: Biclustering is an important exploratory analysis tool that simultaneously clusters rows (e.g., samples) and columns (e.g., variables) of a data matrix. Checkerboard-like biclusters reveal intrinsic associations between rows and columns. However, most existing methods rely on Gaussian assumptions and only apply to matrix data. In practice, non-Gaussian and/or multi-way tensor data are frequently encountered. A new CO-clustering method via Regularized Alternating Least Squares (CORALS) is proposed, which generalizes biclustering to non-Gaussian data and multi-way tensor arrays. Non-Gaussian data are modeled with single-parameter exponential family distributions and co-clusters are identified in the natural parameter space via sparse CANDECOMP/PARAFAC tensor decomposition. A regularized alternating (iteratively reweighted) least squares algorithm is devised for model fitting and a deflation procedure is exploited to automatically determine the number of co-clusters. Comprehensive simulation studies and three real data examples demonstrate the efficacy of the proposed method. The data and code are publicly available.

SUBMITTER: Li G 

PROVIDER: S-EPMC7297185 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC6509871 | biostudies-literature
| S-EPMC4598036 | biostudies-literature
| S-EPMC9785721 | biostudies-literature
| S-EPMC11015955 | biostudies-literature
| S-EPMC4497637 | biostudies-literature
| S-EPMC5548838 | biostudies-literature
| S-EPMC4143681 | biostudies-literature
| S-EPMC6311892 | biostudies-other
| S-EPMC9613874 | biostudies-literature
| S-EPMC5727873 | biostudies-literature