Dataset Information

ThreaDom: extracting protein domain boundary information from multiple threading alignments.

ABSTRACT:

Motivation

Protein domains are subunits that can fold and evolve independently. Identification of domain boundary locations is often the first step in protein folding and function annotations. Most of the current methods deduce domain boundaries by sequence-based analysis, which has low accuracy. There is no efficient method for predicting discontinuous domains that consist of segments from separated sequence regions. As template-based methods are most efficient for protein 3D structure modeling, combining multiple threading alignment information should increase the accuracy and reliability of computational domain predictions.

Result

We developed a new protein domain predictor, ThreaDom, which deduces domain boundary locations based on multiple threading alignments. The core of the method development is the derivation of a domain conservation score that combines information from template domain structures and terminal and internal alignment gaps. Tested on 630 non-redundant sequences, without using homologous templates, ThreaDom generates correct single- and multi-domain classifications in 81% of cases, where 78% have the domain linker assigned within ±20 residues. In a second test on 486 proteins with discontinuous domains, ThreaDom achieves an average precision 84% and recall 65% in domain boundary prediction. Finally, ThreaDom was examined on 56 targets from CASP8 and had a domain overlap rate 73, 87 and 85% with the target for Free Modeling, Hard multiple-domain and discontinuous domain proteins, respectively, which are significantly higher than most domain predictors in the CASP8. Similar results were achieved on the targets from the most recently CASP9 and CASP10 experiments.

Availability

http://zhanglab.ccmb.med.umich.edu/ThreaDom/.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Xue Z

PROVIDER: S-EPMC3694664 | biostudies-literature | 2013 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

ThreaDom: extracting protein domain boundary information from multiple threading alignments.

Xue Zhidong Z Xu Dong D Wang Yan Y Zhang Yang Y

Bioinformatics (Oxford, England) 20130701 13

<h4>Motivation</h4>Protein domains are subunits that can fold and evolve independently. Identification of domain boundary locations is often the first step in protein folding and function annotations. Most of the current methods deduce domain boundaries by sequence-based analysis, which has low accuracy. There is no efficient method for predicting discontinuous domains that consist of segments from separated sequence regions. As template-based methods are most efficient for protein 3D structure ...[more]

PMID: 23812990

Dataset Information

ThreaDom: extracting protein domain boundary information from multiple threading alignments.

Motivation

Result

Availability

Supplementary information

Publications

ThreaDom: extracting protein domain boundary information from multiple threading alignments.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures.
| S-EPMC6315940 | biostudies-literature

MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information.
| S-EPMC2666101 | biostudies-literature

A multiple-template approach to protein threading.
| S-EPMC3092796 | biostudies-literature

Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
| S-EPMC2582601 | biostudies-literature

Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments.
| S-EPMC7297217 | biostudies-literature

A new approach for extracting information from protein dynamics.
| S-EPMC8936122 | biostudies-literature

A new approach for extracting information from protein dynamics.
| S-EPMC9844508 | biostudies-literature

THoR: a tool for domain discovery and curation of multiple alignments.
| S-EPMC193644 | biostudies-literature

Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments.
| S-EPMC4046016 | biostudies-literature

DMAPS: a database of multiple alignments for protein structures.
| S-EPMC1347381 | biostudies-literature