Unknown

Dataset Information

0

Evaluation of alignment algorithms for discovery and identification of pathogens using RNA-Seq.


ABSTRACT: Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-known alignment algorithms (BLAST, BLAT, BWA, BWA-SW, BWA-MEM, BFAST, Bowtie2, Novoalign, GSNAP, SHRiMP2 and STAR) can be used for characterizing mutated and non-mutated viral sequences--including those that exhibit RNA splicing--in transcriptome samples. To evaluate aligners objectively we developed a realistic RNA-Seq simulation and evaluation framework (RiSER) and propose a new combined score to rank aligners for viral characterization in terms of their precision, sensitivity and alignment accuracy. We used RiSER to simulate both human and viral read sequences and suggest the best set of aligners for viral sequence characterization in human transcriptome samples. Our results show that significant and substantial differences exist between aligners and that a digital-subtraction-based viral identification framework can and should use different aligners for different parts of the process. We determine the extent to which mutated viral sequences can be effectively characterized and show that more sensitive aligners such as BLAST, BFAST, SHRiMP2, BWA-SW and GSNAP can accurately characterize substantially divergent viral sequences with up to 15% overall sequence mutation rate. We believe that the results presented here will be useful to researchers choosing aligners for viral sequence characterization using next-generation sequencing data.

SUBMITTER: Borozan I 

PROVIDER: S-EPMC3813700 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluation of alignment algorithms for discovery and identification of pathogens using RNA-Seq.

Borozan Ivan I   Watt Stuart N SN   Ferretti Vincent V  

PloS one 20131030 10


Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-known alignment algorithms (BLAST, BLAT, BWA, BWA-SW, BWA-MEM, BFAST, Bowtie2, Novoalign, GSNAP, SHRiMP2 and STAR) can be used for characterizing mutated and non-mutated viral sequences--including tho  ...[more]

Similar Datasets

| S-EPMC3530550 | biostudies-literature
| S-EPMC4077321 | biostudies-literature
| S-EPMC3167048 | biostudies-literature
2018-02-06 | GSE110114 | GEO
2013-07-15 | E-MTAB-1728 | biostudies-arrayexpress
| S-EPMC6072070 | biostudies-literature
| S-EPMC10827116 | biostudies-literature
2009-11-24 | GSE15370 | GEO
| S-EPMC4263765 | biostudies-literature
2010-05-19 | E-GEOD-15370 | biostudies-arrayexpress