ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.
Ontology highlight
ABSTRACT: Motivation:Chromatin immunoprecipitation sequencing (ChIP-seq) experiments are inexpensive and time-efficient, and result in massive datasets that introduce significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. ChIPWig enables random access, summary statistics lookups and it is based on the asymptotic theory of optimal point density design for nonuniform quantizers. Results:We tested the ChIPWig compressor on 10 ChIP-seq datasets generated by the ENCODE consortium. On average, lossless ChIPWig reduced the file sizes to merely 6% of the original, and offered 6-fold compression rate improvement compared to bigWig. The lossy feature further reduced file sizes 2-fold compared to the lossless mode, with little or no effects on peak calling and motif discovery using specialized NarrowPeaks methods. The compression and decompression speed rates are of the order of 0.2 sec/MB using general purpose computers. Availability and implementation:The source code and binaries are freely available for download at https://github.com/vidarmehr/ChIPWig-v2, implemented in C?++. Contact:milenkov@illinois.edu. Supplementary information:Supplementary data are available at Bioinformatics online.
SUBMITTER: Ravanmehr V
PROVIDER: S-EPMC5860022 | biostudies-literature | 2018 Mar
REPOSITORIES: biostudies-literature
ACCESS DATA