Dataset Information

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

ABSTRACT: Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and ?-? stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).

SUBMITTER: Neuwald AF

PROVIDER: S-EPMC5225019 | biostudies-literature | 2016 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

Neuwald Andrew F AF Altschul Stephen F SF

PLoS computational biology 20161221 12

Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models ...[more]

PMID: 28002465

Similar Datasets

Project description:BackgroundUnigene sequences constitute a rich source of functionally relevant microsatellites. The present study was undertaken to mine the microsatellites in the available unigene sequences of sugarcane for understanding their constitution in the expressed genic component of its complex polyploid/aneuploid genome, assessing their functional significance in silico, determining the extent of allelic diversity at the microsatellite loci and for evaluating their utility in large-scale genotyping applications in sugarcane.ResultsThe average frequency of perfect microsatellite was 1/10.9 kb, while it was 1/44.3 kb for the long and hypervariable class I repeats. GC-rich trinucleotides coding for alanine and the GA-rich dinucleotides were the most abundant microsatellite classes. Out of 15,594 unigenes mined in the study, 767 contained microsatellite repeats and for 672 of these putative functions were determined in silico. The microsatellite repeats were found in the functional domains of proteins encoded by 364 unigenes. Its significance was assessed by establishing the structure-function relationship for the beta-amylase and protein kinase encoding unigenes having repeats in the catalytic domains. A total of 726 allelic variants (7.42 alleles per locus) with different repeat lengths were captured precisely for a set of 47 fluorescent dye labeled primers in 36 sugarcane genotypes and five cereal species using the automated fragment analysis system, which suggested the utility of designed primers for rapid, large-scale and high-throughput genotyping applications in sugarcane. Pair-wise similarity ranging from 0.33 to 0.84 with an average of 0.40 revealed a broad genetic base of the Indian varieties in respect of functionally relevant regions of the large and complex sugarcane genome.ConclusionMicrosatellite repeats were present in 4.92% of sugarcane unigenes, for most (87.6%) of which functions were determined in silico. High level of allelic diversity in repeats including those present in the functional domains of proteins encoded by the unigenes demonstrated their use in assay of useful variation in the genic component of complex polyploid sugarcane genome.

Dataset Information

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

Publications

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets