Dataset Information

Fast R Functions for Robust Correlations and Hierarchical Clustering.

ABSTRACT: Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA.The hierarchical clustering algorithm implemented in R function hclust is an order n(3) (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n(2), leading to substantial time savings when clustering large data sets.

SUBMITTER: Langfelder P

PROVIDER: S-EPMC3465711 | biostudies-literature | 2012 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fast R Functions for Robust Correlations and Hierarchical Clustering.

Langfelder Peter P Horvath Steve S

Journal of statistical software 20120301 11

Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number o ...[more]

PMID: 23050260

Dataset Information

Fast R Functions for Robust Correlations and Hierarchical Clustering.

Publications

Fast R Functions for Robust Correlations and Hierarchical Clustering.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Fast approximate hierarchical clustering using similarity heuristics.
| S-EPMC2561018 | biostudies-literature

Fast tree aggregation for consensus hierarchical clustering.
| S-EPMC7085155 | biostudies-literature

R/BHC: fast Bayesian hierarchical clustering for microarray data.
| S-EPMC2736174 | biostudies-literature

COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations.
| S-EPMC1620014 | biostudies-literature

Swarm: robust and fast clustering method for amplicon-based studies.
| S-EPMC4178461 | biostudies-literature

Hierarchical clustering of PDAC cell lines
2022-11-19 | E-MTAB-8173 | biostudies-arrayexpress

Non-supervised hierarchical clustering of gene expression data
2008-08-30 | GSE12627 | GEO

A robust and fast two-sample test of equal correlations with an application to differential co-expression.
| S-EPMC10278156 | biostudies-literature

Divisive hierarchical maximum likelihood clustering.
| S-EPMC5751574 | biostudies-literature

Statistical significance for hierarchical clustering.
| S-EPMC5708128 | biostudies-literature