Unknown

Dataset Information

0

VeChat: correcting errors in long reads using variation graphs.


ABSTRACT: Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat .

SUBMITTER: Luo X 

PROVIDER: S-EPMC9636371 | biostudies-literature | 2022 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

VeChat: correcting errors in long reads using variation graphs.

Luo Xiao X   Kang Xiongbin X   Schönhuth Alexander A  

Nature communications 20221104 1


Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation gra  ...[more]

Similar Datasets

| S-EPMC5351550 | biostudies-literature
| S-EPMC6204047 | biostudies-literature
| S-EPMC10423031 | biostudies-literature
| S-EPMC6612831 | biostudies-literature
| S-EPMC5206522 | biostudies-literature
| S-EPMC6218980 | biostudies-literature
| S-EPMC10541625 | biostudies-literature
| S-EPMC10690975 | biostudies-literature
| S-EPMC4635656 | biostudies-literature
| S-EPMC8574707 | biostudies-literature