Unknown

Dataset Information

0

A homology identification method that combines protein sequence and structure information.


ABSTRACT: A new method is presented for identifying distantly related homologous proteins that are unrecognizable by conventional sequence comparison methods. The method combines information about functionally conserved sequence patterns with information about structure context. This information is encoded in stochastic discrete state-space models (DSMs) that comprise a new family of hidden Markov models. The new models are called sequence-pattern-embedded DSMs (pDSMs). This method can identify distantly related protein family members with a high sensitivity and specificity. The method is illustrated with trypsin-like serine proteases and globins. The strategy for building pDSMs is presented. The method has been validated using carefully constructed positive and negative control sets. In addition to the ability to recognize remote homologs, pDSM sequence analysis predicts secondary structures with higher sensitivity, specificity, and Q3 accuracy than DSM analysis, which omits information about conserved sequence patterns. The identification of trypsin-like serine proteases in new genomes is discussed.

SUBMITTER: Yu L 

PROVIDER: S-EPMC2143896 | biostudies-other | 1998 Dec

REPOSITORIES: biostudies-other

altmetric image

Publications

A homology identification method that combines protein sequence and structure information.

Yu L L   White J V JV   Smith T F TF  

Protein science : a publication of the Protein Society 19981201 12


A new method is presented for identifying distantly related homologous proteins that are unrecognizable by conventional sequence comparison methods. The method combines information about functionally conserved sequence patterns with information about structure context. This information is encoded in stochastic discrete state-space models (DSMs) that comprise a new family of hidden Markov models. The new models are called sequence-pattern-embedded DSMs (pDSMs). This method can identify distantly  ...[more]

Similar Datasets

| S-EPMC3078102 | biostudies-literature
| S-EPMC2660303 | biostudies-literature
| S-EPMC11344590 | biostudies-literature
| S-EPMC1988853 | biostudies-literature
| S-EPMC7537947 | biostudies-literature
| S-EPMC4908355 | biostudies-literature
| S-EPMC5793808 | biostudies-literature
2013-05-25 | E-GEOD-46611 | biostudies-arrayexpress
2013-05-25 | GSE46611 | GEO
| S-EPMC7900904 | biostudies-literature