Unknown

Dataset Information

0

Balancing efficient analysis and storage of quantitative genomics data with the D4 format and d4tools.


ABSTRACT: Modern DNA sequencing is used as a readout for diverse assays, with the count of aligned sequences (read depth) representing the quantitative signal for each underlying cellular phenomena. Existing data formats for quantitative genomics assays are, however, limited in either the analysis speeds they enable, the disk space they require or both. We have developed the dense depth data dump (D4) format and tool suite, with the goal of balancing improved analysis speeds with file size. The D4 format is adaptive in that it profiles a random sample of aligned sequence depth from the input sequence file to determine an optimal encoding that enables fast data access. We demonstrate that the D4 format offers substantial speed improvements over existing formats for random access, aggregation and summarization, while also achieving better or comparable file sizes. This performance enables scalable downstream analyses that would be otherwise difficult.

SUBMITTER: Hou H 

PROVIDER: S-EPMC9355464 | biostudies-literature | 2021 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Balancing efficient analysis and storage of quantitative genomics data with the D4 format and d4tools.

Hou Hao H   Pedersen Brent B   Quinlan Aaron A  

Nature computational science 20210621 6


Modern DNA sequencing is used as a readout for diverse assays, with the count of aligned sequences (read depth) representing the quantitative signal for each underlying cellular phenomena. Existing data formats for quantitative genomics assays are, however, limited in either the analysis speeds they enable, the disk space they require or both. We have developed the dense depth data dump (D4) format and tool suite, with the goal of balancing improved analysis speeds with file size. The D4 format  ...[more]

Similar Datasets

| S-EPMC5860110 | biostudies-literature
| S-EPMC8337000 | biostudies-literature
| S-EPMC7805039 | biostudies-literature
| S-EPMC5547444 | biostudies-other
| S-EPMC5574715 | biostudies-literature
| S-EPMC7058350 | biostudies-literature
| S-EPMC9259476 | biostudies-literature
| S-EPMC7465036 | biostudies-literature
| S-EPMC9360040 | biostudies-literature
| S-EPMC11369284 | biostudies-literature