Dataset Information

The CaspBase: a curated database for evolutionary biochemical studies of caspase functional divergence and ancestral sequence inference.

ABSTRACT: Sequence databases are powerful tools for the contemporary scientists' toolkit. However, most functional annotations in public databases are determined computationally and are not verified by a human expert. While hypotheses generated from computational studies are now amenable to experimentation, the quality of the results relies on the quality of input data. We developed the CaspBase to expedite high-quality dataset compilation of annotated caspase sequences, to maximize phylogenetic signal, and to reduce the noise contributed from public databanks. We describe our methods of curation for the CaspBase and how researchers can acquire sequences from CaspBase.org. Our immediate goal for developing the CaspBase was to optimize the ancestral protein reconstruction (APR) of caspases, and we demonstrate the utility of the CaspBase in APR studies. We also developed the Common Position (CP) system for comparing human caspase family paralogs and suggest the CP system as an update to current reporting methods of caspase amino acid positions. We present a standardized multiple sequence alignment (MSA) for the CP system and show the advantage of using large databases such as the CaspBase in defining structural positions in proteins. Although the results described here pertain to caspase evolution and structure-function studies, the methods can be adapted to any gene family.

SUBMITTER: Grinshpon RD

PROVIDER: S-EPMC6199153 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The CaspBase: a curated database for evolutionary biochemical studies of caspase functional divergence and ancestral sequence inference.

Grinshpon Robert D RD Williford Anna A Titus-McQuillan James J Clay Clark A A

Protein science : a publication of the Protein Society 20181001 10

Sequence databases are powerful tools for the contemporary scientists' toolkit. However, most functional annotations in public databases are determined computationally and are not verified by a human expert. While hypotheses generated from computational studies are now amenable to experimentation, the quality of the results relies on the quality of input data. We developed the CaspBase to expedite high-quality dataset compilation of annotated caspase sequences, to maximize phylogenetic signal, a ...[more]

PMID: 30076665

Similar Datasets

Project description:Recurring patterns of primary structure have been observed in enzymes that mediate sequential metabolic reactions in bacteria. The enzymes, muconolactone Delta-isomerase [(+)-4-hydroxy-4-carboxymethylisocrotonolactone Delta(2)-Delta(3)-isomerase, EC 5.3.3.4] and beta-ketoadipate enol-lactone hydrolase [4-carboxymethylbut-3-enolide(1,4)enol-lactone-hydrolase, EC 3.1.1.24], have been coselected in bacterial populations because the isomerase can confer no nutritional advantage in the absence of the hydrolase. Similar amino acid sequences recur within the structure of the isomerase, and the amino-terminal amino acid sequence of the isomerase from Pseudomonas putida appears to be evolutionarily homologous with the corresponding sequence of a beta-ketoadipate enol-lactone hydrolase from Acinetobacter calcoaceticus. One interpretation of the sequence repetitions is that they reflect tandem duplication mutations that took place early in the evolution of the proteins. According to this view, the mutations caused elongation of structural genes and the creation of duplicated genes as the metabolic pathways evolved. A review of the sequence data calls attention to a different hypothesis: repeated amino acid sequences were introduced in the course of the proteins' evolution by substitution of copies of DNA sequences into structural genes. Our observations are interpreted on the basis of a model proposing genetic exchange between misaligned DNA sequences. The model predicts that misalignments in one chromosomal region can influence the nature of mutations in another region. Thus, as often has been observed, the mutability of a base pair will be determined by its location in a DNA sequence. Furthermore, the intrachromosomal recombination of DNA sequences may account for complex genetic modifications that occur as new pathways evolve. The model provides an interpretation of an apparent paradox, the rapid creation of new metabolic traits by bacterial genomes that are remarkably resistant to genetic drift.

Project description:Assassin bugs are one of the most successful clades of predatory animals based on their species numbers (∼6,800 spp.) and wide distribution in terrestrial ecosystems. Various novel prey capture strategies and remarkable prey specializations contribute to their appeal as a model to study evolutionary pathways involved in predation. Here, we reconstruct the most comprehensive reduviid phylogeny (178 taxa, 18 subfamilies) to date based on molecular data (5 markers). This phylogeny tests current hypotheses on reduviid relationships emphasizing the polyphyletic Reduviinae and the blood-feeding, disease-vectoring Triatominae, and allows us, for the first time in assassin bugs, to reconstruct ancestral states of prey associations and microhabitats. Using a fossil-calibrated molecular tree, we estimated divergence times for key events in the evolutionary history of Reduviidae. Our results indicate that the polyphyletic Reduviinae fall into 11-14 separate clades. Triatominae are paraphyletic with respect to the reduviine genus Opisthacidius in the maximum likelihood analyses; this result is in contrast to prior hypotheses that found Triatominae to be monophyletic or polyphyletic and may be due to the more comprehensive taxon and character sampling in this study. The evolution of blood-feeding may thus have occurred once or twice independently among predatory assassin bugs. All prey specialists evolved from generalist ancestors, with multiple evolutionary origins of termite and ant specializations. A bark-associated life style on tree trunks is ancestral for most of the lineages of Higher Reduviidae; living on foliage has evolved at least six times independently. Reduviidae originated in the Middle Jurassic (178 Ma), but significant lineage diversification only began in the Late Cretaceous (97 Ma). The integration of molecular phylogenetics with fossil and life history data as presented in this paper provides insights into the evolutionary history of reduviids and clears the way for in-depth evolutionary hypothesis testing in one of the most speciose clades of predators.

Dataset Information

The CaspBase: a curated database for evolutionary biochemical studies of caspase functional divergence and ancestral sequence inference.

Publications

The CaspBase: a curated database for evolutionary biochemical studies of caspase functional divergence and ancestral sequence inference.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets