Unknown

Dataset Information

0

CoLoRd: compressing long reads.


ABSTRACT: The cost of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today's genomic research. In spite of the increasing popularity of third-generation sequencing, the existing algorithms for compressing long reads exhibit a minor advantage over the general-purpose gzip. We present CoLoRd, an algorithm able to reduce the size of third-generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyses.

SUBMITTER: Kokot M 

PROVIDER: S-EPMC9337911 | biostudies-literature | 2022 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

CoLoRd: compressing long reads.

Kokot Marek M   Gudyś Adam A   Li Heng H   Deorowicz Sebastian S  

Nature methods 20220328 4


The cost of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today's genomic research. In spite of the increasing popularity of third-generation sequencing, the existing algorithms for compressing long reads exhibit a minor advantage over the general-purpose gzip. We present CoLoRd, an algorithm able to reduce the size of third-generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyses. ...[more]

Similar Datasets

| S-EPMC7168855 | biostudies-literature
| S-EPMC6506941 | biostudies-literature
| S-EPMC7671326 | biostudies-literature
| S-EPMC7419660 | biostudies-literature
| S-EPMC6547545 | biostudies-literature
| S-EPMC7759537 | biostudies-literature
| S-EPMC8665758 | biostudies-literature
| S-EPMC4126851 | biostudies-literature
| S-EPMC7504856 | biostudies-literature
| S-EPMC6902338 | biostudies-literature