Unknown

Dataset Information

0

Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity.


ABSTRACT: The function of a substantial percentage of the putative protein-coding open reading frames (ORFs) in viral genomes is unknown. As their sequence is not similar to that of proteins of known function, the function of these ORFs cannot be assigned on the basis of sequence similarity. Methods complement or in combination with sequence similarity-based approaches are being explored. The web-based software SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi) to some extent assigns protein functional family irrespective of sequence similarity and has been found to be useful for studying distantly related proteins [Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z., 2003. SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res. 31(13): 3692-3697]. Here 25 novel viral proteins are selected to test the capability of SVMProt for functional family assignment of viral proteins whose function cannot be confidently predicted on by sequence similarity methods at present. These proteins are without a sequence homolog in the Swissprot database, with its precise function provided in the literature, and not included in the training sets of SVMProt. The predicted functional classes of 72% of these proteins match the literature-described function, which is compared to the overall accuracy of 87% for SVMProt functional class assignment of 34582 proteins. This suggests that SVMProt to some extent is capable of functional class assignment irrespective of sequence similarity and it is potentially useful for facilitating functional study of novel viral proteins.

SUBMITTER: Han LY 

PROVIDER: S-EPMC7111859 | biostudies-literature | 2005 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity.

Han L Y LY   Cai C Z CZ   Ji Z L ZL   Chen Y Z YZ  

Virology 20050101 1


The function of a substantial percentage of the putative protein-coding open reading frames (ORFs) in viral genomes is unknown. As their sequence is not similar to that of proteins of known function, the function of these ORFs cannot be assigned on the basis of sequence similarity. Methods complement or in combination with sequence similarity-based approaches are being explored. The web-based software SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi) to some extent assigns protein functio  ...[more]

Similar Datasets

| S-EPMC535691 | biostudies-literature
| S-EPMC2789692 | biostudies-literature
| S-EPMC4985167 | biostudies-literature
| S-EPMC7470713 | biostudies-literature
| S-EPMC7156691 | biostudies-literature
| S-EPMC10138783 | biostudies-literature
| S-EPMC10557500 | biostudies-literature
| S-EPMC2760442 | biostudies-literature
| S-EPMC2573399 | biostudies-literature
| S-EPMC11327874 | biostudies-literature