Unknown

Dataset Information

0

ACO:lossless quality score compression based on adaptive coding order.


ABSTRACT:

Background

With the rapid development of high-throughput sequencing technology, the cost of whole genome sequencing drops rapidly, which leads to an exponential growth of genome data. How to efficiently compress the DNA data generated by large-scale genome projects has become an important factor restricting the further development of the DNA sequencing industry. Although the compression of DNA bases has achieved significant improvement in recent years, the compression of quality score is still challenging.

Results

In this paper, by reinvestigating the inherent correlations between the quality score and the sequencing process, we propose a novel lossless quality score compressor based on adaptive coding order (ACO). The main objective of ACO is to traverse the quality score adaptively in the most correlative trajectory according to the sequencing process. By cooperating with the adaptive arithmetic coding and an improved in-context strategy, ACO achieves the state-of-the-art quality score compression performances with moderate complexity for the next-generation sequencing (NGS) data.

Conclusions

The competence enables ACO to serve as a candidate tool for quality score compression, ACO has been employed by AVS(Audio Video coding Standard Workgroup of China) and is freely available at https://github.com/Yoniming/ACO.

SUBMITTER: Niu Y 

PROVIDER: S-EPMC9175485 | biostudies-literature | 2022 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

ACO:lossless quality score compression based on adaptive coding order.

Niu Yi Y   Ma Mingming M   Li Fu F   Liu Xianming X   Shi Guangming G  

BMC bioinformatics 20220607 1


<h4>Background</h4>With the rapid development of high-throughput sequencing technology, the cost of whole genome sequencing drops rapidly, which leads to an exponential growth of genome data. How to efficiently compress the DNA data generated by large-scale genome projects has become an important factor restricting the further development of the DNA sequencing industry. Although the compression of DNA bases has achieved significant improvement in recent years, the compression of quality score is  ...[more]

Similar Datasets

| S-EPMC7517294 | biostudies-literature
| S-EPMC7165212 | biostudies-literature
| S-EPMC6873394 | biostudies-literature
| S-EPMC7079445 | biostudies-literature
| S-EPMC8271783 | biostudies-literature
| S-EPMC10098950 | biostudies-literature
| S-EPMC3592443 | biostudies-literature
| S-EPMC10909184 | biostudies-literature
| S-EPMC6707603 | biostudies-literature
| S-EPMC6761962 | biostudies-literature