Unknown

Dataset Information

0

GToTree: a user-friendly workflow for phylogenomics.


ABSTRACT: SUMMARY:Genome-level evolutionary inference (i.e. phylogenomics) is becoming an increasingly essential step in many biologists' work. Accordingly, there are several tools available for the major steps in a phylogenomics workflow. But for the biologist whose main focus is not bioinformatics, much of the computational work required-such as accessing genomic data on large scales, integrating genomes from different file formats, performing required filtering, stitching different tools together etc.-can be prohibitive. Here I introduce GToTree, a command-line tool that can take any combination of fasta files, GenBank files and/or NCBI assembly accessions as input and outputs an alignment file, estimates of genome completeness and redundancy, and a phylogenomic tree based on a specified single-copy gene (SCG) set. Although GToTree can work with any custom hidden Markov Models (HMMs), also included are 13 newly generated SCG-set HMMs for different lineages and levels of resolution, built based on searches of ?12 000 bacterial and archaeal high-quality genomes. GToTree aims to give more researchers the capability to make phylogenomic trees. AVAILABILITY AND IMPLEMENTATION:GToTree is open-source and freely available for download from: github.com/AstrobioMike/GToTree. It is implemented primarily in bash with helper scripts written in python. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Lee MD 

PROVIDER: S-EPMC6792077 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

GToTree: a user-friendly workflow for phylogenomics.

Lee Michael D MD  

Bioinformatics (Oxford, England) 20191001 20


<h4>Summary</h4>Genome-level evolutionary inference (i.e. phylogenomics) is becoming an increasingly essential step in many biologists' work. Accordingly, there are several tools available for the major steps in a phylogenomics workflow. But for the biologist whose main focus is not bioinformatics, much of the computational work required-such as accessing genomic data on large scales, integrating genomes from different file formats, performing required filtering, stitching different tools togeth  ...[more]

Similar Datasets

| S-EPMC6659180 | biostudies-literature
| S-EPMC5310375 | biostudies-literature
| S-EPMC7114615 | biostudies-literature
| S-EPMC8352501 | biostudies-literature
| S-EPMC4228112 | biostudies-literature
| S-EPMC8279420 | biostudies-literature
2013-06-13 | GSE40748 | GEO
| S-EPMC3840672 | biostudies-literature
2013-06-13 | E-GEOD-40748 | biostudies-arrayexpress
| S-EPMC7935533 | biostudies-literature