Dataset Information

Protein subfamily assignment using the Conserved Domain Database.

ABSTRACT:

Background

Domains, evolutionarily conserved units of proteins, are widely used to classify protein sequences and infer protein function. Often, two or more overlapping domain models match a region of a protein sequence. Therefore, procedures are required to choose appropriate domain annotations for the protein. Here, we propose a method for assigning NCBI-curated domains from the Curated Domain Database (CDD) that takes into account the organization of the domains into hierarchies of homologous domain models.

Findings

Our analysis of alignment scores from NCBI-curated domain assignments suggests that identifying the correct model among closely related models is more difficult than choosing between non-overlapping domain models. We find that simple heuristics based on sorting scores and domain-specific thresholds are effective at reducing classification error. In fact, in our test set, the heuristics result in almost 90% of current misclassifications due to missing domain subfamilies being replaced by more generic domain assignments, thereby eliminating a significant amount of error within the database.

Conclusion

Our proposed domain subfamily assignment rule has been incorporated into the CD-Search software for assigning CDD domains to query protein sequences and has significantly improved pre-calculated domain annotations on protein sequences in NCBI's Entrez resource.

SUBMITTER: Fong JH

PROVIDER: S-EPMC2632666 | biostudies-literature | 2008 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Protein subfamily assignment using the Conserved Domain Database.

Fong Jessica H JH Marchler-Bauer Aron A

BMC research notes 20081114

<h4>Background</h4>Domains, evolutionarily conserved units of proteins, are widely used to classify protein sequences and infer protein function. Often, two or more overlapping domain models match a region of a protein sequence. Therefore, procedures are required to choose appropriate domain annotations for the protein. Here, we propose a method for assigning NCBI-curated domains from the Curated Domain Database (CDD) that takes into account the organization of the domains into hierarchies of ho ...[more]

PMID: 19014584

Dataset Information

Protein subfamily assignment using the Conserved Domain Database.

Background

Findings

Conclusion

Publications

Protein subfamily assignment using the Conserved Domain Database.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

NCBI's Conserved Domain Database and Tools for Protein Domain Analysis.
| S-EPMC7378889 | biostudies-literature

NMR assignment of the conserved bacterial DNA replication protein DnaA domain IV.
| S-EPMC11511705 | biostudies-literature

CDD: NCBI's conserved domain database.
| S-EPMC4383992 | biostudies-literature

Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database.
| S-EPMC155287 | biostudies-literature

Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index.
| S-EPMC1764483 | biostudies-literature

Domain-mediated interactions for protein subfamily identification.
| S-EPMC6959277 | biostudies-literature

Annotation of functional sites with the Conserved Domain Database.
| S-EPMC3308149 | biostudies-literature

Using a customised database for eDNA fish assignment (OSU approach)
| PRJEB57749 | ENA

Improving protein structure similarity searches using domain boundaries based on conserved sequence information.
| S-EPMC2694201 | biostudies-literature

DOMINE: a database of protein domain interactions.
| S-EPMC2238965 | biostudies-literature