Dataset Information

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

ABSTRACT:

Background

Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences.

Results

The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes.

Conclusions

The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.

SUBMITTER: Mizianty MJ

PROVIDER: S-EPMC2805645 | biostudies-literature | 2009 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

Mizianty Marcin J MJ Kurgan Lukasz L

BMC bioinformatics 20091213

<h4>Background</h4>Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with t ...[more]

PMID: 20003388

Similar Datasets

Project description:BackgroundProtein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction.ResultsSCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors.ConclusionThe SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

Project description:BackgroundPrediction of protein structural classes (alpha, beta, alpha + beta and alpha/beta) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%.ResultsWe propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/.ConclusionThe high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of alpha helices and beta strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.

Project description:What happens in the brain when conscious awareness of the surrounding world fades? We manipulated consciousness in two experiments in a group of healthy males and measured brain activity with positron emission tomography. Measurements were made during wakefulness, escalating and constant levels of two anesthetic agents (experiment 1, n = 39), and during sleep-deprived wakefulness and non-rapid eye movement sleep (experiment 2, n = 37). In experiment 1, the subjects were randomized to receive either propofol or dexmedetomidine until unresponsiveness. In both experiments, forced awakenings were applied to achieve rapid recovery from an unresponsive to a responsive state, followed by immediate and detailed interviews of subjective experiences during the preceding unresponsive condition. Unresponsiveness rarely denoted unconsciousness, as the majority of the subjects had internally generated experiences. Unresponsive anesthetic states and verified sleep stages, where a subsequent report of mental content included no signs of awareness of the surrounding world, indicated a disconnected state. Functional brain imaging comparing responsive and connected versus unresponsive and disconnected states of consciousness during constant anesthetic exposure revealed that activity of the thalamus, cingulate cortices, and angular gyri are fundamental for human consciousness. These brain structures were affected independent from the pharmacologic agent, drug concentration, and direction of change in the state of consciousness. Analogous findings were obtained when consciousness was regulated by physiological sleep. State-specific findings were distinct and separable from the overall effects of the interventions, which included widespread depression of brain activity across cortical areas. These findings identify a central core brain network critical for human consciousness.SIGNIFICANCE STATEMENT Trying to understand the biological basis of human consciousness is currently one of the greatest challenges of neuroscience. While the loss and return of consciousness regulated by anesthetic drugs and physiological sleep are used as model systems in experimental studies on consciousness, previous research results have been confounded by drug effects, by confusing behavioral "unresponsiveness" and internally generated consciousness, and by comparing brain activity levels across states that differ in several other respects than only consciousness. Here, we present carefully designed studies that overcome many previous confounders and for the first time reveal the neural mechanisms underlying human consciousness and its disconnection from behavioral responsiveness, both during anesthesia and during normal sleep, and in the same study subjects.

Dataset Information

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

Background

Results

Conclusions

Publications

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets