Ontology highlight
ABSTRACT: Motivation
The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving.Results
On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (see Supplementary Material S6 for details).Availability and implementation
Crumble is OpenSource and can be obtained from https://github.com/jkbonfield/crumble.Supplementary information
Supplementary data are available at Bioinformatics online.
SUBMITTER: Bonfield JK
PROVIDER: S-EPMC6330002 | biostudies-literature | 2019 Jan
REPOSITORIES: biostudies-literature
Bonfield James K JK McCarthy Shane A SA Durbin Richard R
Bioinformatics (Oxford, England) 20190101 2
<h4>Motivation</h4>The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving.<h4>Results</h4>On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file ( ...[more]