Dataset Information

A model to predict the function of hypothetical proteins through a nine-point classification scoring schema.

ABSTRACT: BACKGROUND:Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. RESULTS:In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). CONCLUSION:With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs.

SUBMITTER: Ijaq J

PROVIDER: S-EPMC6325861 | biostudies-other | 2019 Jan

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

A model to predict the function of hypothetical proteins through a nine-point classification scoring schema.

Ijaq Johny J Malik Girik G Kumar Anuj A Das Partha Sarathi PS Meena Narendra N Bethi Neeraja N Sundararajan Vijayaraghava Seshadri VS Suravajhala Prashanth P

BMC bioinformatics 20190108 1

<h4>Background</h4>Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been atte ...[more]

PMID: 30621574

Dataset Information

A model to predict the function of hypothetical proteins through a nine-point classification scoring schema.

Publications

A model to predict the function of hypothetical proteins through a nine-point classification scoring schema.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A classification scoring schema to validate protein interactors.
| S-EPMC3282273 | biostudies-literature

Scoring function to predict solubility mutagenesis.
| S-EPMC2958853 | biostudies-literature

Function prediction and analysis of mycobacterium tuberculosis hypothetical proteins.
| S-EPMC3397526 | biostudies-literature

Hybrid scoring and classification approaches to predict human pregnane X receptor activators.
| S-EPMC2836910 | biostudies-literature

A Mixed QM/MM Scoring Function to Predict Protein-Ligand Binding Affinity.
| S-EPMC3017370 | biostudies-literature

Investigating function roles of hypothetical proteins encoded by the Mycobacterium tuberculosis H37Rv genome.
| S-EPMC6528289 | biostudies-literature

A new pathological scoring system by the Japanese classification to predict renal outcome in diabetic nephropathy.
| S-EPMC5800536 | biostudies-literature

A knowledge-based scoring function to assess quaternary associations of proteins.
| S-EPMC7425177 | biostudies-literature

Lipid transfer proteins: classification, nomenclature, structure, and function.
| S-EPMC5052319 | biostudies-literature

A hierarchical anatomical classification schema for prediction of phenotypic side effects.
| S-EPMC5832387 | biostudies-literature