Dataset Information

TagDust2: a generic method to extract reads from sequencing data.

ABSTRACT: Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial.Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection.Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net .

SUBMITTER: Lassmann T

PROVIDER: S-EPMC4384298 | biostudies-literature | 2015 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

TagDust2: a generic method to extract reads from sequencing data.

Lassmann Timo T

BMC bioinformatics 20150128

<h4>Background</h4>Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial.<h4>Results</h4>Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagD ...[more]

PMID: 25627334

Dataset Information

TagDust2: a generic method to extract reads from sequencing data.

Publications

TagDust2: a generic method to extract reads from sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

BatAlign: an incremental method for accurate alignment of sequencing reads.
| S-EPMC4652746 | biostudies-literature

Somatic variant analysis of linked-reads sequencing data with Lancet.
| S-EPMC8487631 | biostudies-literature

Konnector v2.0: pseudo-long reads from paired-end sequencing data.
| S-EPMC4582294 | biostudies-literature

Data-dependent bucketing improves reference-free compression of sequencing reads.
| S-EPMC4547610 | biostudies-literature

CNV-TV: a robust method to discover copy number variation from short sequencing reads.
| S-EPMC3679874 | biostudies-literature

3'READS+, a sensitive and accurate method for 3' end sequencing of polyadenylated RNA.
| S-EPMC5029459 | biostudies-literature

FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads.
| S-EPMC5451431 | biostudies-literature

Complete genome sequencing of Dehalococcoides sp. strain UCH007 using a differential reads picking method.
| S-EPMC4644273 | biostudies-literature

Genotype calling from next-generation sequencing data using haplotype information of reads.
| S-EPMC3493122 | biostudies-literature

Sequencing facility and DNA source associated patterns of virus-mappable reads in whole-genome sequencing data.
| S-EPMC7856238 | biostudies-literature