Dataset Information

Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D.

ABSTRACT: Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challenging in protein functional prediction. More sensitive scoring models are highly needed to improve the predictive performance of the alignment methods, and alignment annotation with better visualization has also become imperative for result interpretation. Here we report an open-source application PSISearch2D that runs query-seeded iterative sequence search for remotely related protein detection. PSISearch2D retrieves domain annotation from Pfam, UniProtKB, CDD and PROSITE for resulting hits and demonstrates combined domain and sequence alignments in novel visualizations. A scoring model called C-value is newly defined to re-order hits with consideration of the combination of sequence and domain alignments. The benchmarking on the use of C-value indicates that PSISearch2D outperforms the original PSISearch2 tool in terms of both accuracy and specificity. PSISearch2D improves the characterization of unknown proteins in remote protein detection. Our evaluation tests show that PSISearch2D has provided annotation for 77 695 of 139 503 unknown bacteria proteins and 140 751 of 352 757 unknown virus proteins in UniProtKB, about 2.3-fold and 1.8-fold more characterization than the original PSISearch2, respectively. Together with advanced features of auto-iteration mode to handle large-scale data and optional programs for global and local sequence alignments, PSISearch2D enhances remotely related protein search.

SUBMITTER: Yang M

PROVIDER: S-EPMC6637259 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D.

Yang Minglei M Zhang Wenliang W Yao Guocai G Zhang Haiyue H Li Weizhong W

Database : the journal of biological databases and curation 20190101

Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challenging in protein functional prediction. More sensitive scoring models are highly needed to improve the predictive performance of the alignment methods, and alignment annotation with better visualization ...[more]

PMID: 31317184

Dataset Information

Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D.

Publications

Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

pFlexAna: detecting conformational changes in remotely related proteins.
| S-EPMC2447781 | biostudies-literature

A search for energy minimized sequences of proteins.
| S-EPMC2724685 | biostudies-literature

Increasing sequence search sensitivity with transitive alignments.
| S-EPMC3573025 | biostudies-literature

Detecting remotely related proteins by their interactions and sequence similarity.
| S-EPMC1129109 | biostudies-literature

A simple method for finding related sequences by adding probabilities of alternative alignments.
| S-EPMC11444175 | biostudies-literature

Master Blaster: an approach to sensitive identification of remotely related proteins.
| S-EPMC8062480 | biostudies-literature

Using HHsearch to tackle proteins of unknown function: A pilot study with PH domains.
| S-EPMC5091641 | biostudies-literature

Annotating RNA motifs in sequences and alignments.
| S-EPMC4333381 | biostudies-literature

Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.
| S-EPMC2957690 | biostudies-literature

Genome-wide search for eliminylating domains reveals novel function for BLES03-like proteins.
| S-EPMC4159009 | biostudies-literature