Dataset Information

Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.

ABSTRACT:

Unlabelled

Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability-the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency-the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed-the parallel version can run very fast on multiple computing nodes with limited local memory.

Availability and implementation

The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems). The package is freely available at http://zhoulab.usc.edu/Hi-Corrector/.

SUBMITTER: Li W

PROVIDER: S-EPMC4380031 | biostudies-literature | 2015 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.

Li Wenyuan W Gong Ke K Li Qingjiao Q Alber Frank F Zhou Xianghong Jasmine XJ

Bioinformatics (Oxford, England) 20141112 6

<h4>Unlabelled</h4>Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a gre ...[more]

PMID: 25391400

Dataset Information

Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.

Unlabelled

Availability and implementation

Publications

Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SCEMENT: scalable and memory efficient integration of large-scale single-cell RNA-sequencing data.
| S-EPMC12013815 | biostudies-literature

Fast and Memory-Efficient Dynamic Programming Approach for Large-Scale EHH-Based Selection Scans.
| S-EPMC12659807 | biostudies-literature

FastQTLmapping: an ultra-fast and memory efficient package for mQTL-like analysis.
| S-EPMC12036243 | biostudies-literature

SILGGM: An extensive R package for efficient statistical inference in large-scale gene networks.
| S-EPMC6107288 | biostudies-literature

Analog in-memory computing attention mechanism for fast and energy-efficient large language models.
| S-EPMC12457188 | biostudies-literature

A highly efficient, scalable pipeline for fixed feature extraction from large-scale high-content imaging screens.
| S-EPMC11667173 | biostudies-literature

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data.
| S-EPMC12373131 | biostudies-literature

A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.
| S-EPMC7641476 | biostudies-literature