Unknown

Dataset Information

0

Protein subfamily assignment using the Conserved Domain Database.


ABSTRACT: BACKGROUND: Domains, evolutionarily conserved units of proteins, are widely used to classify protein sequences and infer protein function. Often, two or more overlapping domain models match a region of a protein sequence. Therefore, procedures are required to choose appropriate domain annotations for the protein. Here, we propose a method for assigning NCBI-curated domains from the Curated Domain Database (CDD) that takes into account the organization of the domains into hierarchies of homologous domain models. FINDINGS: Our analysis of alignment scores from NCBI-curated domain assignments suggests that identifying the correct model among closely related models is more difficult than choosing between non-overlapping domain models. We find that simple heuristics based on sorting scores and domain-specific thresholds are effective at reducing classification error. In fact, in our test set, the heuristics result in almost 90% of current misclassifications due to missing domain subfamilies being replaced by more generic domain assignments, thereby eliminating a significant amount of error within the database. CONCLUSION: Our proposed domain subfamily assignment rule has been incorporated into the CD-Search software for assigning CDD domains to query protein sequences and has significantly improved pre-calculated domain annotations on protein sequences in NCBI's Entrez resource.

SUBMITTER: Fong JH 

PROVIDER: S-EPMC2632666 | biostudies-literature | 2008

REPOSITORIES: biostudies-literature

altmetric image

Publications

Protein subfamily assignment using the Conserved Domain Database.

Fong Jessica H JH   Marchler-Bauer Aron A  

BMC research notes 20081114


<h4>Background</h4>Domains, evolutionarily conserved units of proteins, are widely used to classify protein sequences and infer protein function. Often, two or more overlapping domain models match a region of a protein sequence. Therefore, procedures are required to choose appropriate domain annotations for the protein. Here, we propose a method for assigning NCBI-curated domains from the Curated Domain Database (CDD) that takes into account the organization of the domains into hierarchies of ho  ...[more]

Similar Datasets

| S-EPMC7378889 | biostudies-literature
| S-EPMC4383992 | biostudies-literature
| S-EPMC155287 | biostudies-literature
| S-EPMC1764483 | biostudies-literature
| S-EPMC6959277 | biostudies-literature
| S-EPMC3308149 | biostudies-literature
| PRJEB57749 | ENA
| S-EPMC3057503 | biostudies-literature
| S-EPMC4271147 | biostudies-other
| S-EPMC51501 | biostudies-other