Unknown

Dataset Information

0

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.


ABSTRACT: The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms.

SUBMITTER: Zhang W 

PROVIDER: S-EPMC6356754 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.

Zhang Wenjing W   Huang Neng N   Zheng Jiantao J   Liao Xingyu X   Wang Jianxin J   Li Hong-Dong HD  

Genes 20190114 1


The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the result  ...[more]

Similar Datasets

| S-EPMC3581251 | biostudies-literature
| S-EPMC8033682 | biostudies-literature
| S-EPMC5964631 | biostudies-literature
| S-EPMC3096631 | biostudies-literature
| S-EPMC2825236 | biostudies-literature
| S-EPMC7109398 | biostudies-literature
| S-EPMC5004134 | biostudies-literature
| S-EPMC6678717 | biostudies-literature
| S-EPMC7144081 | biostudies-literature
| S-EPMC5261608 | biostudies-literature