Unknown

Dataset Information

0

PhyloHerb: A high-throughput phylogenomic pipeline for processing genome skimming data.


ABSTRACT:

Premise

The application of high-throughput sequencing, especially to herbarium specimens, is rapidly accelerating biodiversity research. Low-coverage sequencing of total genomic DNA (genome skimming) is particularly promising and can simultaneously recover the plastid, mitochondrial, and nuclear ribosomal regions across hundreds of species. Here, we introduce PhyloHerb, a bioinformatic pipeline to efficiently assemble phylogenomic data sets derived from genome skimming.

Methods and results

PhyloHerb uses either a built-in database or user-specified references to extract orthologous sequences from all three genomes using a BLAST search. It outputs FASTA files and offers a suite of utility functions to assist with alignment, partitioning, concatenation, and phylogeny inference. The program is freely available at https://github.com/lmcai/PhyloHerb/.

Conclusions

We demonstrate that PhyloHerb can accurately identify genes using a published data set from Clusiaceae. We also show via simulations that our approach is effective for highly fragmented assemblies from herbarium specimens and is scalable to thousands of species.

SUBMITTER: Cai L 

PROVIDER: S-EPMC9215275 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC7714717 | biostudies-literature
| S-EPMC4412416 | biostudies-literature
| S-EPMC6793853 | biostudies-literature
| S-ECPF-GEOD-40617 | biostudies-other
| S-EPMC4700955 | biostudies-literature
| S-EPMC4152589 | biostudies-literature
| S-EPMC7799681 | biostudies-literature
| S-EPMC3973741 | biostudies-literature
| S-EPMC3738164 | biostudies-literature
| S-EPMC5320598 | biostudies-literature