Unknown

Dataset Information

0

The impacts of read length and transcriptome complexity for de novo assembly: a simulation study.


ABSTRACT: Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, though many assembly tools are available now, it is unclear whether the existing assemblers perform well enough for all data with different transcriptome complexities. In this paper, we studied these two open problems using two high-performing assemblers, Velvet/Oases and Trinity, on several simulated datasets of human, mouse and S.cerevisiae. The results suggest that (1) the read length of paired reads does not matter once it exceeds a certain threshold, and interestingly, the threshold is distinct in different organisms; (2) the quality of de novo assembly decreases sharply with the increase of transcriptome complexity, all existing de novo assemblers tend to corrupt whenever the genes contain a large number of alternative splicing events.

SUBMITTER: Chang Z 

PROVIDER: S-EPMC3988101 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

The impacts of read length and transcriptome complexity for de novo assembly: a simulation study.

Chang Zheng Z   Wang Zhenjia Z   Li Guojun G  

PloS one 20140415 4


Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, though many assembly tools are available now, it is unclear whether the existing assemblers perform well  ...[more]

Similar Datasets

| S-EPMC3287467 | biostudies-literature
| S-EPMC6289447 | biostudies-literature
| S-EPMC3485621 | biostudies-literature
| S-EPMC3749127 | biostudies-literature
| S-EPMC3223268 | biostudies-literature
| S-EPMC6511074 | biostudies-literature
| S-EPMC8590762 | biostudies-literature
| S-EPMC3288049 | biostudies-literature
| S-EPMC2336801 | biostudies-literature
| S-EPMC3968166 | biostudies-literature