Dataset Information

The identification of complete domains within protein sequences using accurate E-values for semi-global alignment.

ABSTRACT: The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a 'semi-global alignment'. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance.

SUBMITTER: Kann MG

PROVIDER: S-EPMC1950549 | biostudies-literature | 2007

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The identification of complete domains within protein sequences using accurate E-values for semi-global alignment.

Kann Maricel G MG Sheetlin Sergey L SL Park Yonil Y Bryant Stephen H SH Spouge John L JL

Nucleic acids research 20070627 14

The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a 'semi-global alignment'. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, howev ...[more]

PMID: 17596268

Dataset Information

The identification of complete domains within protein sequences using accurate E-values for semi-global alignment.

Publications

The identification of complete domains within protein sequences using accurate E-values for semi-global alignment.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Introducing difference recurrence relations for faster semi-global alignment of long sequences.
| S-EPMC5836832 | biostudies-literature

BioAlign: An Accurate Global PPI Network Alignment Algorithm.
| S-EPMC9309777 | biostudies-literature

UPP2: fast and accurate alignment of datasets with fragmentary sequences.
| S-EPMC9846425 | biostudies-literature

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.
| S-EPMC1955456 | biostudies-literature

MACSIMS: multiple alignment of complete sequences information management system.
| S-EPMC1539025 | biostudies-literature

GOSSIP: a method for fast and accurate global alignment of protein structures.
| S-EPMC3065682 | biostudies-literature

HubAlign: an accurate and efficient method for global alignment of protein-protein interaction networks.
| S-EPMC4147903 | biostudies-literature

Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points.
| S-EPMC6330006 | biostudies-literature

Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.
| S-EPMC2732367 | biostudies-literature

Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses.
| S-EPMC9214144 | biostudies-literature