Dataset Information

Using homology relations within a database markedly boosts protein sequence similarity search.

ABSTRACT: Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

SUBMITTER: Tong J

PROVIDER: S-EPMC4460465 | biostudies-literature | 2015 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using homology relations within a database markedly boosts protein sequence similarity search.

Tong Jing J Sadreyev Ruslan I RI Pei Jimin J Kinch Lisa N LN Grishin Nick V NV

Proceedings of the National Academy of Sciences of the United States of America 20150518 22

Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the ...[more]

PMID: 26038555

Dataset Information

Using homology relations within a database markedly boosts protein sequence similarity search.

Publications

Using homology relations within a database markedly boosts protein sequence similarity search.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.
| S-EPMC4699916 | biostudies-literature

A large-scale assessment of sequence database search tools for homology-based protein function prediction.
| S-EPMC11262835 | biostudies-literature

A large-scale assessment of sequence database search tools for homology-based protein function prediction.
| S-EPMC10680702 | biostudies-literature

Minimally-overlapping words for sequence similarity search.
| S-EPMC8016470 | biostudies-literature

GHOSTX: an improved sequence homology search algorithm using a query suffix array and a database suffix array.
| S-EPMC4123905 | biostudies-literature

PLMSearch: Protein language model powers accurate and fast sequence search for remote homology.
| S-EPMC10981738 | biostudies-literature

Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix.
| S-EPMC5274646 | biostudies-literature

Identification of protein biochemical functions by similarity search using the molecular surface database eF-site.
| S-EPMC2323945 | biostudies-literature

Similarity search for local protein structures at atomic resolution by exploiting a database management system.
| S-EPMC5036654 | biostudies-literature

CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.
| S-EPMC11531700 | biostudies-literature