Unknown

Dataset Information

0

THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes.


ABSTRACT: New methodology must be developed to improve the ability to characterize the growing number of amino acid sequences, which vastly exceeds the number of experimentally determined protein structures. Homologous proteins can be used as structural templates for modeling proteins that do not have experimentally determined structures. However, in many cases, there are no homologous proteins (typically <30% sequence identity) with determined structures from which a query sequence can be reliably modeled. The aim of protein threading is to use features, such as secondary structure, solvent accessibility and torsional angles, in addition to sequence patterns to identify structural templates from the protein databank to assist for full-length atomic-level structural modeling. However, there are still numerous protein sequences for which correct templates cannot be recognized. This raises the question as to what attributes allow query sequences to be matched to the correct but distantly homologous templates. To aid the investigation into this question and to provide genome-score protein structure for the biological community, a database called THE-DB (threading hard and easy protein database) has been developed in which it becomes possible to analyze over 15 000 query sequences from the Escherichia coli (E. coli) K12 and human proteomes, as well as to find their three-dimensional templates derived from the state-of-the-art threading algorithms which is not feasible with existing protein template databases. The E. coli K12 and human data can be downloaded in bulk from the THE-DB page.

SUBMITTER: Diamond JS 

PROVIDER: S-EPMC6146127 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes.

Diamond Justin S JS   Zhang Yang Y  

Database : the journal of biological databases and curation 20180101


New methodology must be developed to improve the ability to characterize the growing number of amino acid sequences, which vastly exceeds the number of experimentally determined protein structures. Homologous proteins can be used as structural templates for modeling proteins that do not have experimentally determined structures. However, in many cases, there are no homologous proteins (typically <30% sequence identity) with determined structures from which a query sequence can be reliably modele  ...[more]

Similar Datasets

| S-EPMC4304177 | biostudies-literature
| S-EPMC5054713 | biostudies-literature
| S-EPMC4076494 | biostudies-literature
| S-EPMC2853689 | biostudies-literature
| S-EPMC3435559 | biostudies-literature
| S-EPMC3134792 | biostudies-literature
| S-EPMC3371845 | biostudies-other
| S-EPMC8034526 | biostudies-literature
| S-EPMC4515031 | biostudies-literature
| S-EPMC2142932 | biostudies-other