Unknown

Dataset Information

0

Sequence comparative analysis using networks: software for evaluating de novo transcript assembly from next-generation sequencing.


ABSTRACT: DNA sequencing technology is becoming more accessible to a variety of researchers as costs continue to decline. As researchers begin to sequence novel transcriptomes, most of these data sets lack a reference genome and will have to rely on de novo assemblers. Making comparisons across assemblies can be difficult: each program has its strengths and weaknesses, and no tool exists to comparatively evaluate these data sets. We developed software in R, called Sequence Comparative Analysis using Networks (SCAN), to perform statistical comparisons between distinct assemblies. SCAN uses a reference data set to identify the most accurate de novo assembly and the "good" transcripts in the user's data. We tested SCAN on three publicly available transcriptomes, each assembled using three assembly programs. Moreover, we sequenced the transcriptome of the oomycete Achlya hypogyna and compared de novo assemblies from Velvet, ABySS, and the CLC Genomics Workbench assembly algorithms. One thousand one hundred twenty-eight of the CLC transcripts were statistically similar to the reference, compared with 49 of the Velvet transcripts and 937 of the ABySS transcripts. SCAN's strength is providing statistical support for transcript assemblies in a biological context. However, SCAN is designed to compare distinct node sets in networks, therefore it can also easily be extended to perform statistical comparisons on any network graph regardless of what the nodes represent.

SUBMITTER: Misner I 

PROVIDER: S-EPMC3708500 | biostudies-literature | 2013 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Sequence comparative analysis using networks: software for evaluating de novo transcript assembly from next-generation sequencing.

Misner Ian I   Bicep Cédric C   Lopez Philippe P   Halary Sébastien S   Bapteste Eric E   Lane Christopher E CE  

Molecular biology and evolution 20130510 8


DNA sequencing technology is becoming more accessible to a variety of researchers as costs continue to decline. As researchers begin to sequence novel transcriptomes, most of these data sets lack a reference genome and will have to rely on de novo assemblers. Making comparisons across assemblies can be difficult: each program has its strengths and weaknesses, and no tool exists to comparatively evaluate these data sets. We developed software in R, called Sequence Comparative Analysis using Netwo  ...[more]

Similar Datasets

| S-EPMC3137213 | biostudies-literature
| S-EPMC3056720 | biostudies-literature
| S-EPMC2945192 | biostudies-literature
| S-EPMC3288049 | biostudies-literature
| S-EPMC3526293 | biostudies-literature
| S-EPMC3639258 | biostudies-literature
| S-EPMC3726674 | biostudies-literature
| S-EPMC6549443 | biostudies-literature
| S-EPMC3272011 | biostudies-literature
| S-EPMC5572715 | biostudies-literature