Unknown

Dataset Information

0

Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data.


ABSTRACT: The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

SUBMITTER: Daly GM 

PROVIDER: S-EPMC4476701 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data.

Daly Gordon M GM   Leggett Richard M RM   Rowe William W   Stubbs Samuel S   Wilkinson Maxim M   Ramirez-Gonzalez Ricardo H RH   Caccamo Mario M   Bernal William W   Heeney Jonathan L JL  

PloS one 20150622 6


The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with  ...[more]

Similar Datasets

| S-EPMC5037392 | biostudies-literature
| S-EPMC3933208 | biostudies-other
| S-EPMC3244418 | biostudies-literature
| S-EPMC8464054 | biostudies-literature
| S-EPMC3708773 | biostudies-literature
2017-04-03 | PXD003804 | Pride
| S-EPMC8941893 | biostudies-literature
| S-EPMC3633874 | biostudies-other
| S-EPMC6556987 | biostudies-literature
| S-EPMC3562067 | biostudies-literature