Unknown

Dataset Information

0

PyBedGraph: a python package for fast operations on 1D genomic signal tracks.


ABSTRACT: MOTIVATION:Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed. RESULTS:We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ?0.26?s and can compute their approximate means in <0.12?s on a conventional laptop. AVAILABILITY AND IMPLEMENTATION:pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Zhang HB 

PROVIDER: S-EPMC7214040 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

pyBedGraph: a python package for fast operations on 1D genomic signal tracks.

Zhang Henry B HB   Kim Minji M   Chuang Jeffrey H JH   Ruan Yijun Y  

Bioinformatics (Oxford, England) 20200501 10


<h4>Motivation</h4>Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Pyt  ...[more]

Similar Datasets

| S-EPMC10785526 | biostudies-literature
| S-EPMC10833567 | biostudies-literature
| S-EPMC4837986 | biostudies-literature
| S-EPMC7597035 | biostudies-literature
| S-EPMC8138882 | biostudies-literature
| S-EPMC8168212 | biostudies-literature
| S-EPMC8275978 | biostudies-literature
| S-EPMC5022704 | biostudies-literature
| S-EPMC6454532 | biostudies-literature
| S-EPMC7271019 | biostudies-literature