Unknown

Dataset Information

0

Improved transcriptome assembly using a hybrid of long and short reads with StringTie.


ABSTRACT: Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.

SUBMITTER: Shumate A 

PROVIDER: S-EPMC9191730 | biostudies-literature | 2022 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Improved transcriptome assembly using a hybrid of long and short reads with StringTie.

Shumate Alaina A   Wong Brandon B   Pertea Geo G   Pertea Mihaela M  

PLoS computational biology 20220601 6


Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-re  ...[more]

Similar Datasets

| S-EPMC4907386 | biostudies-literature
| S-EPMC5518131 | biostudies-literature
| S-EPMC4582210 | biostudies-literature
| S-EPMC7419660 | biostudies-literature
| S-EPMC9750119 | biostudies-literature
| S-EPMC7850483 | biostudies-literature
2023-10-14 | GSE215357 | GEO
2023-10-14 | GSE215355 | GEO
| S-EPMC4374946 | biostudies-other
| S-EPMC10629667 | biostudies-literature