Dataset Information

Assessing De Novo transcriptome assembly metrics for consistency and utility.

ABSTRACT:

Background

Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is still unclear which of these metrics accurately reflect assembly quality.

Results

We simulated sequencing transcripts of Drosophila melanogaster. By assembling these simulated reads using both a "perfect" and a modern transcriptome assembler while varying read length and sequencing depth, we evaluated quality metrics to determine whether they 1) revealed perfect assemblies to be of higher quality, and 2) revealed perfect assemblies to be more complete as data quantity increased.Several commonly used metrics were not consistent with these expectations, including average contig coverage and length, though they became consistent when singletons were included in the analysis. We found several annotation-based metrics to be consistent and informative, including contig reciprocal best hit count and contig unique annotation count. Finally, we evaluated a number of novel metrics such as reverse annotation count, contig collapse factor, and the ortholog hit ratio, discovering that each assess assembly quality in unique ways.

Conclusions

Although much attention has been given to transcriptome assembly, little research has focused on determining how best to evaluate assemblies, particularly in light of the variety of options available for read length and sequencing depth. Our results provide an important review of these metrics and give researchers tools to produce the highest quality transcriptome assemblies.

SUBMITTER: O'Neil ST

PROVIDER: S-EPMC3733778 | biostudies-literature | 2013 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessing De Novo transcriptome assembly metrics for consistency and utility.

O'Neil Shawn T ST Emrich Scott J SJ

BMC genomics 20130709

<h4>Background</h4>Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is still unclear which of these metrics accurately reflect assembly quality.<h4>Results</h4>We simulated sequencing transcripts of Drosophila melanogaster. By assembling these simulated reads using both a "perfect" and a modern transcriptome assembler while varying read length and sequen ...[more]

PMID: 23837739

Dataset Information

Assessing De Novo transcriptome assembly metrics for consistency and utility.

Background

Results

Conclusions

Publications

Assessing De Novo transcriptome assembly metrics for consistency and utility.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

De novo transcriptome assembly of shrimp Palaemon serratus.
| S-EPMC5200869 | biostudies-literature

Informed kmer selection for de novo transcriptome assembly.
| S-EPMC4892416 | biostudies-literature

Metrics for assessing cytoskeletal orientational correlations and consistency.
| S-EPMC4388480 | biostudies-literature

De novo transcriptome assembly of two different apricot cultivars.
| S-EPMC4664767 | biostudies-literature

De novo transcriptome assembly of Sorghum bicolor variety Taejin.
| S-EPMC4878842 | biostudies-literature

A Bayesian approach for accurate de novo transcriptome assembly.
| S-EPMC8417280 | biostudies-literature

De novo transcriptome assembly of Setatria italica variety Taejin.
| S-EPMC4878839 | biostudies-literature

Effect of de novo transcriptome assembly on transcript quantification.
| S-EPMC6549443 | biostudies-literature

De novo transcriptome assembly of two contrasting pumpkin cultivars.
| S-EPMC4778644 | biostudies-literature

De novo transcriptome assembly of the mycoheterotrophic plant Monotropa hypopitys.
| S-EPMC5154972 | biostudies-literature