Dataset Information

A rapid classification protocol for the CATH Domain Database to support structural genomics.

ABSTRACT: In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.

SUBMITTER: Pearl FM

PROVIDER: S-EPMC29791 | biostudies-literature | 2001 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A rapid classification protocol for the CATH Domain Database to support structural genomics.

Pearl F M FM Martin N N Bray J E JE Buchan D W DW Harrison A P AP Lee D D Reeves G A GA Shepherd A J AJ Sillitoe I I Todd A E AE Thornton J M JM Orengo C A CA

Nucleic acids research 20010101 1

In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant st ...[more]

PMID: 11125098

Dataset Information

A rapid classification protocol for the CATH Domain Database to support structural genomics.

Publications

A rapid classification protocol for the CATH Domain Database to support structural genomics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database.
| S-EPMC155287 | biostudies-literature

The history of the CATH structural classification of protein domains.
| S-EPMC4678953 | biostudies-literature

The CATH extended protein-family database: providing structural annotations for genome sequences.
| S-EPMC2373435 | biostudies-literature

Functional classification of CATH superfamilies: a domain-based approach for protein function annotation.
| S-EPMC4612221 | biostudies-literature

SCOR: a Structural Classification of RNA database.
| S-EPMC99131 | biostudies-literature

SCOP database in 2002: refinements accommodate structural genomics.
| S-EPMC99154 | biostudies-literature

TOPSAN: a dynamic web database for structural genomics.
| S-EPMC3013775 | biostudies-literature

The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space.
| S-EPMC2741583 | biostudies-literature

CATH: increased structural coverage of functional space.
| S-EPMC7778904 | biostudies-literature