Unknown

Dataset Information

0

The CATH extended protein-family database: providing structural annotations for genome sequences.


ABSTRACT: An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.

SUBMITTER: Pearl FM 

PROVIDER: S-EPMC2373435 | biostudies-literature | 2002 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

The CATH extended protein-family database: providing structural annotations for genome sequences.

Pearl Frances M G FM   Lee David D   Bray James E JE   Buchan Daniel W A DW   Shepherd Adrian J AJ   Orengo Christine A CA  

Protein science : a publication of the Protein Society 20020201 2


An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequenc  ...[more]

Similar Datasets

| S-EPMC4384018 | biostudies-literature
| S-EPMC6323983 | biostudies-literature
| S-EPMC4489299 | biostudies-literature
| S-EPMC3525972 | biostudies-literature
| S-EPMC29791 | biostudies-literature
| S-EPMC155287 | biostudies-literature
| S-EPMC4678953 | biostudies-literature
| EGAD00010002152 | EGA
| S-EPMC5532967 | biostudies-literature
| S-EPMC5210578 | biostudies-literature