Dataset Information

Enzyme classification with peptide programs: a comparative study.

ABSTRACT:

Background

Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length.

Results

We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets.

Conclusion

The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required.Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy.

SUBMITTER: Faria D

PROVIDER: S-EPMC2724424 | biostudies-literature | 2009 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Enzyme classification with peptide programs: a comparative study.

Faria Daniel D Ferreira António E N AE Falcão André O AO

BMC bioinformatics 20090724

<h4>Background</h4>Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length.<h4>Result ...[more]

PMID: 19630945

Similar Datasets

Project description:BackgroundTwo prostate cancer (PC) classification methods based on transcriptome profiles, a de novo method referred to as the "Prostate Cancer Classification System" (PCS) and a variation of the established PAM50 breast cancer algorithm, were recently proposed. Both studies concluded that most human PC can be assigned to one of three tumor subtypes, two categorized as luminal and one as basal, suggesting the two methods reflect consistency in underlying biology. Despite the similarity, differences and commonalities between the two classification methods have not yet been reported.MethodsHere, we describe a comparison of the PCS and PAM50 classification systems. PCS and PAM50 signatures consisting of 37 (PCS37) and 50 genes, respectively, were used to categorize 9,947 PC patients into PCS and PAM50 classes. Enrichment of hallmark gene sets and luminal and basal marker gene expression were assessed in the same datasets. Finally, survival analysis was performed to compare PCS and PAM50 subtypes in terms of clinical outcomes.ResultsPCS and PAM50 subtypes show clear differential expression of PCS37 and PAM50 genes. While only three genes are shared in common between the two systems, there is some consensus between three subtype pairs (PCS1 versus Luminal B, PCS2 versus Luminal A, and PCS3 versus Basal) with respect to gene expression, cellular processes, and clinical outcomes. PCS categories displayed better separation of cellular processes and luminal and basal marker gene expression compared to PAM50. Although both PCS1 and Luminal B tumors exhibited the worst clinical outcomes, outcomes between aggressive and less aggressive subtypes were better defined in the PCS system, based on larger hazard ratios observed.ConclusionThe PCS and PAM50 classification systems are similar in terms of molecular profiles and clinical outcomes. However, the PCS system exhibits greater separation in multiple clinical outcomes and provides better separation of prostate luminal and basal characteristics.

Dataset Information

Enzyme classification with peptide programs: a comparative study.

Background

Results

Conclusion

Publications

Enzyme classification with peptide programs: a comparative study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets