Unknown

Dataset Information

0

Cross-validation of protein structural class prediction using statistical clustering and neural networks.


ABSTRACT: We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all-alpha, all-beta, or alpha-beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.

SUBMITTER: Metfessel BA 

PROVIDER: S-EPMC2142422 | biostudies-other | 1993 Jul

REPOSITORIES: biostudies-other

altmetric image

Publications

Cross-validation of protein structural class prediction using statistical clustering and neural networks.

Metfessel B A BA   Saurugger P N PN   Connelly D P DP   Rich S S SS  

Protein science : a publication of the Protein Society 19930701 7


We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 a  ...[more]

Similar Datasets

| S-EPMC2423446 | biostudies-literature
| S-EPMC7790373 | biostudies-literature
| S-EPMC7689358 | biostudies-literature
| S-EPMC8180888 | biostudies-literature
| S-EPMC3341732 | biostudies-other
| S-EPMC6612824 | biostudies-other
| S-EPMC5745637 | biostudies-literature
| S-EPMC8622176 | biostudies-literature