Dataset Information

THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes.

ABSTRACT: New methodology must be developed to improve the ability to characterize the growing number of amino acid sequences, which vastly exceeds the number of experimentally determined protein structures. Homologous proteins can be used as structural templates for modeling proteins that do not have experimentally determined structures. However, in many cases, there are no homologous proteins (typically <30% sequence identity) with determined structures from which a query sequence can be reliably modeled. The aim of protein threading is to use features, such as secondary structure, solvent accessibility and torsional angles, in addition to sequence patterns to identify structural templates from the protein databank to assist for full-length atomic-level structural modeling. However, there are still numerous protein sequences for which correct templates cannot be recognized. This raises the question as to what attributes allow query sequences to be matched to the correct but distantly homologous templates. To aid the investigation into this question and to provide genome-score protein structure for the biological community, a database called THE-DB (threading hard and easy protein database) has been developed in which it becomes possible to analyze over 15 000 query sequences from the Escherichia coli (E. coli) K12 and human proteomes, as well as to find their three-dimensional templates derived from the state-of-the-art threading algorithms which is not feasible with existing protein template databases. The E. coli K12 and human data can be downloaded in bulk from the THE-DB page.

SUBMITTER: Diamond JS

PROVIDER: S-EPMC6146127 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes.

Diamond Justin S JS Zhang Yang Y

Database : the journal of biological databases and curation 20180101

New methodology must be developed to improve the ability to characterize the growing number of amino acid sequences, which vastly exceeds the number of experimentally determined protein structures. Homologous proteins can be used as structural templates for modeling proteins that do not have experimentally determined structures. However, in many cases, there are no homologous proteins (typically <30% sequence identity) with determined structures from which a query sequence can be reliably modele ...[more]

PMID: 30239678

Dataset Information

THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes.

Publications

THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

MUFOLD-DB: a processed protein structure database for protein structure prediction and analysis.
| S-EPMC4304177 | biostudies-literature

Mapping monomeric threading to protein-protein structure prediction.
| S-EPMC4076494 | biostudies-literature

NSort/DB: an intranuclear compartment protein database.
| S-EPMC5054713 | biostudies-literature

SDOP-DB: a comparative standardized-protocol database for mouse phenotypic analyses.
| S-EPMC2853689 | biostudies-literature

A conditional neural fields model for protein threading.
| S-EPMC3371845 | biostudies-literature

Protein-protein complex structure predictions by multimeric threading and template recombination.
| S-EPMC3134792 | biostudies-literature

P(3)DB: An Integrated Database for Plant Protein Phosphorylation.
| S-EPMC3435559 | biostudies-literature

The iPPI-DB initiative: a community-centered database of protein-protein interaction modulators.
| S-EPMC8034526 | biostudies-literature

Database of RNA binding protein expression and disease dynamics (READ DB).
| S-EPMC4515031 | biostudies-literature

ModBase, a database of annotated comparative protein structure models, and associated resources.
| S-EPMC3013688 | biostudies-literature