Venomix: a simple bioinformatic pipeline for identifying and characterizing toxin gene candidates from transcriptomic data.
Ontology highlight
ABSTRACT: The advent of next-generation sequencing has resulted in transcriptome-based approaches to investigate functionally significant biological components in a variety of non-model organism. This has resulted in the area of "venomics": a rapidly growing field using combined transcriptomic and proteomic datasets to characterize toxin diversity in a variety of venomous taxa. Ultimately, the transcriptomic portion of these analyses follows very similar pathways after transcriptome assembly often including candidate toxin identification using BLAST, expression level screening, protein sequence alignment, gene tree reconstruction, and characterization of potential toxin function. Here we describe the Python package Venomix, which streamlines these processes using common bioinformatic tools along with ToxProt, a publicly available annotated database comprised of characterized venom proteins. In this study, we use the Venomix pipeline to characterize candidate venom diversity in four phylogenetically distinct organisms, a cone snail (Conidae; Conus sponsalis), a snake (Viperidae; Echis coloratus), an ant (Formicidae; Tetramorium bicarinatum), and a scorpion (Scorpionidae; Urodacus yaschenkoi). Data on these organisms were sampled from public databases, with each original analysis using different approaches for transcriptome assembly, toxin identification, or gene expression quantification. Venomix recovered numerically more candidate toxin transcripts for three of the four transcriptomes than the original analyses and identified new toxin candidates. In summary, we show that the Venomix package is a useful tool to identify and characterize the diversity of toxin-like transcripts derived from transcriptomic datasets. Venomix is available at: https://bitbucket.org/JasonMacrander/Venomix/.
SUBMITTER: Macrander J
PROVIDER: S-EPMC6074769 | biostudies-literature | 2018
REPOSITORIES: biostudies-literature
ACCESS DATA