Unknown

Dataset Information

0

DTA-SiST: de novo transcriptome assembly by using simplified suffix trees.


ABSTRACT: BACKGROUND:Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. However, the massive size of short reads makes de novo transcripts assembly an algorithmic challenge. RESULTS:We develop a novel radical framework, called DTA-SiST, for de novo transcriptome assembly based on suffix trees. DTA-SiST first extends contigs by reads that have the longest overlaps with the contigs' terminuses. These reads can be found in linear time of the lengths of the reads through a well-designed suffix tree structure. Then, DTA-SiST constructs splicing graphs based on contigs for each gene locus. Finally, DTA-SiST proposes two strategies to extract transcript-representing paths: a depth-first enumeration strategy and a hybrid strategy based on length and coverage. We implemented the above two strategies and compared them with the state-of-the-art de novo assemblers on both simulated and real datasets. Experimental results showed that the depth-first enumeration strategy performs always better with recall and also better with precision for smaller datasets while the hybrid strategy leads with precision for big datasets. CONCLUSIONS:DTA-SiST performs more competitive than the other compared de novo assemblers especially with precision measure, due to the read-based contig extension strategy and the elegant transcripts extraction rules.

SUBMITTER: Zhao J 

PROVIDER: S-EPMC6929406 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

DTA-SiST: de novo transcriptome assembly by using simplified suffix trees.

Zhao Jin J   Feng Haodi H   Zhu Daming D   Zhang Chi C   Xu Ying Y  

BMC bioinformatics 20191224 Suppl 25


<h4>Background</h4>Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. However, the massive size of short reads makes de novo transcripts assembly an algorithmic challenge.<h4>Results</h4>We develop a novel radical framework, called DTA-SiST, for de novo transcriptome assembly based on suffix trees. DTA-SiST first  ...[more]

Similar Datasets

| S-EPMC3910276 | biostudies-literature
| S-EPMC5200869 | biostudies-literature
| S-EPMC4892416 | biostudies-literature
| S-EPMC4342890 | biostudies-literature
| S-EPMC3951189 | biostudies-literature
| S-EPMC4695645 | biostudies-literature
| S-EPMC4134189 | biostudies-literature
| S-EPMC3485621 | biostudies-literature
| S-EPMC4537571 | biostudies-literature
| S-EPMC7011030 | biostudies-literature