Unknown

Dataset Information

0

Simplifier: a web tool to eliminate redundant NGS contigs.


ABSTRACT: UNLABELLED:Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. AVAILABILITY:Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher.

SUBMITTER: Ramos RT 

PROVIDER: S-EPMC3524941 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

Simplifier: a web tool to eliminate redundant NGS contigs.

Ramos Rommel Thiago Jucá RT   Carneiro Adriana Ribeiro AR   Azevedo Vasco V   Schneider Maria Paula MP   Barh Debmalya D   Silva Artur A  

Bioinformation 20121013 20


<h4>Unlabelled</h4>Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by  ...[more]

Similar Datasets

| S-EPMC6829986 | biostudies-literature
| S-EPMC9252826 | biostudies-literature
| S-EPMC7203750 | biostudies-literature
2024-03-24 | GSE228266 | GEO
| S-EPMC10450166 | biostudies-literature
| S-EPMC9895803 | biostudies-literature
| S-EPMC4670531 | biostudies-literature
| S-EPMC4071203 | biostudies-literature
| S-EPMC7678096 | biostudies-literature
2020-05-01 | GSE145506 | GEO