Dataset Information

TreeToReads - a pipeline for simulating raw reads from phylogenies.

ABSTRACT: BACKGROUND:Using phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA's SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered. RESULTS:To resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree. CONCLUSIONS:Such critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings.

SUBMITTER: McTavish EJ

PROVIDER: S-EPMC5359950 | biostudies-literature | 2017 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

TreeToReads - a pipeline for simulating raw reads from phylogenies.

McTavish Emily Jane EJ Pettengill James J Davis Steven S Rand Hugh H Strain Errol E Allard Marc M Timme Ruth E RE

BMC bioinformatics 20170320 1

<h4>Background</h4>Using phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are ...[more]

PMID: 28320310

Dataset Information

TreeToReads - a pipeline for simulating raw reads from phylogenies.

Publications

TreeToReads - a pipeline for simulating raw reads from phylogenies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Ophiocordyceps sinensis raw sequence reads
2019-05-14 | GSE123085 | GEO

Oryza sativa Raw sequence reads
2015-04-22 | E-MTAB-4312 | biostudies-arrayexpress

Oryza sativa Raw sequence reads
2015-07-31 | E-MTAB-4347 | biostudies-arrayexpress

Populus tricho Raw sequence reads
2015-06-01 | E-MTAB-4364 | biostudies-arrayexpress

Arabidopsis thaliana Raw sequence reads
2015-06-20 | E-MTAB-4396 | biostudies-arrayexpress

Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies.
| S-EPMC4580465 | biostudies-literature

SECAPR-a bioinformatics pipeline for the rapid and user-friendly processing of targeted enriched Illumina sequences, from raw reads to alignments.
| S-EPMC6047508 | biostudies-literature

Automated reconstruction of whole-genome phylogenies from short-sequence reads.
| S-EPMC3995342 | biostudies-literature

Cyprinus carpio 'koi' Raw sequence reads
2019-01-15 | GSE125039 | GEO

Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads.
| S-EPMC7459848 | biostudies-literature