Dataset Information

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.

ABSTRACT: The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms.

SUBMITTER: Zhang W

PROVIDER: S-EPMC6356754 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.

Zhang Wenjing W Huang Neng N Zheng Jiantao J Liao Xingyu X Wang Jianxin J Li Hong-Dong HD

Genes 20190114 1

The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the result ...[more]

PMID: 30646604

Dataset Information

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.

Publications

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Alignment-free sequence comparison based on next-generation sequencing reads.
| S-EPMC3581251 | biostudies-literature

Improving protein domain classification for third-generation sequencing reads using deep learning.
| S-EPMC8033682 | biostudies-literature

CoverView: a sequence quality evaluation tool for next generation sequencing data.
| S-EPMC5964631 | biostudies-literature

Comparison of sequence reads obtained from three next-generation sequencing platforms.
| S-EPMC3096631 | biostudies-literature

Identification of novel non-coding RNAs using profiles of short sequence reads from next generation sequencing data.
| S-EPMC2825236 | biostudies-literature

MASQC: Next Generation Sequencing Assists Third Generation Sequencing for Quality Control in N6-Methyladenine DNA Identification.
| S-EPMC7109398 | biostudies-literature

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies.
| S-EPMC5004134 | biostudies-literature

TGStools: A Bioinformatics Suit to Facilitate Transcriptome Analysis of Long Reads from Third Generation Sequencing Platform.
| S-EPMC6678717 | biostudies-literature

LongQC: A Quality Control Tool for Third Generation Sequencing Long Read Data.
| S-EPMC7144081 | biostudies-literature

Novel Primer Sets for Next Generation Sequencing-Based Analyses of Water Quality.
| S-EPMC5261608 | biostudies-literature