Unknown

Dataset Information

0

Crumble: reference free lossy compression of sequence quality values.


ABSTRACT:

Motivation

The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving.

Results

On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (see Supplementary Material S6 for details).

Availability and implementation

Crumble is OpenSource and can be obtained from https://github.com/jkbonfield/crumble.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Bonfield JK 

PROVIDER: S-EPMC6330002 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Crumble: reference free lossy compression of sequence quality values.

Bonfield James K JK   McCarthy Shane A SA   Durbin Richard R  

Bioinformatics (Oxford, England) 20190101 2


<h4>Motivation</h4>The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving.<h4>Results</h4>On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (  ...[more]

Similar Datasets

| S-EPMC5856090 | biostudies-other
| S-EPMC5568552 | biostudies-literature
| S-EPMC5862240 | biostudies-literature
| S-EPMC7372835 | biostudies-literature
| S-EPMC8507489 | biostudies-literature
| S-EPMC8907274 | biostudies-literature
| S-EPMC7336184 | biostudies-literature
| S-EPMC5946873 | biostudies-literature
| S-EPMC7842218 | biostudies-literature
| S-EPMC6873394 | biostudies-literature