Dataset Information

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

ABSTRACT: A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., 'loops') within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).

SUBMITTER: Rangan AV

PROVIDER: S-EPMC5997363 | biostudies-literature | 2018 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

Rangan Aaditya V AV McGrouther Caroline C CC Kelsoe John J Schork Nicholas N Stahl Eli E Zhu Qian Q Krishnan Arjun A Yao Vicky V Troyanskaya Olga O Bilaloglu Seda S Raghavan Preeti P Bergen Sarah S Jureus Anders A Landen Mikael M

PLoS computational biology 20180514 5

A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., 'loops') within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that ...[more]

PMID: 29758032

Dataset Information

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

Publications

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A regularization corrected score method for nonlinear regression models with covariate error.
| S-EPMC3622191 | biostudies-literature

GrCount: Counting method for uncertain data.
| S-EPMC6838520 | biostudies-literature

Robust covariate-adjusted log-rank statistics and corresponding sample size formula for recurrent events data.
| S-EPMC2795392 | biostudies-literature

Covariate-adjusted Spearman's rank correlation with probability-scale residuals.
| S-EPMC5949238 | biostudies-literature

A Low-Rank Method for Characterizing High-Level Neural Computations.
| S-EPMC5534486 | biostudies-literature

Myocardial T1, T2, T2*, and fat fraction quantification via low-rank motion-corrected cardiac MR fingerprinting.
| S-EPMC9306903 | biostudies-literature

Inference on Low-Rank Data Matrices with Applications to Microarray Data.
| S-EPMC2876352 | biostudies-literature

Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data.
| S-EPMC5870648 | biostudies-literature

Low-Rank and Sparse Recovery of Human Gait Data.
| S-EPMC7472490 | biostudies-literature

Improved low-rank matrix recovery method for predicting miRNA-disease association.
| S-EPMC5519594 | biostudies-literature