Dataset Information

More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology.

ABSTRACT: Large-scale genome sequencing gained general importance for life science because functional annotation of otherwise experimentally uncharacterized sequences is made possible by the theory of biomolecular sequence homology. Historically, the paradigm of similarity of protein sequences implying common structure, function and ancestry was generalized based on studies of globular domains. Having the same fold imposes strict conditions over the packing in the hydrophobic core requiring similarity of hydrophobic patterns. The implications of sequence similarity among non-globular protein segments have not been studied to the same extent; nevertheless, homology considerations are silently extended for them. This appears especially detrimental in the case of transmembrane helices (TMs) and signal peptides (SPs) where sequence similarity is necessarily a consequence of physical requirements rather than common ancestry. Thus, matching of SPs/TMs creates the illusion of matching hydrophobic cores. Therefore, inclusion of SPs/TMs into domain models can give rise to wrong annotations. More than 1001 domains among the 10,340 models of Pfam release 23 and 18 domains of SMART version 6 (out of 809) contain SP/TM regions. As expected, fragment-mode HMM searches generate promiscuous hits limited to solely the SP/TM part among clearly unrelated proteins. More worryingly, we show explicit examples that the scores of clearly false-positive hits, even in global-mode searches, can be elevated into the significance range just by matching the hydrophobic runs. In the PIR iProClass database v3.74 using conservative criteria, we find that at least between 2.1% and 13.6% of its annotated Pfam hits appear unjustified for a set of validated domain models. Thus, false-positive domain hits enforced by SP/TM regions can lead to dramatic annotation errors where the hit has nothing in common with the problematic domain model except the SP/TM region itself. We suggest a workflow of flagging problematic hits arising from SP/TM-containing models for critical reconsideration by annotation users.

SUBMITTER: Wong WC

PROVIDER: S-EPMC2912341 | biostudies-literature | 2010

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology.

Wong Wing-Cheong WC Maurer-Stroh Sebastian S Eisenhaber Frank F

PLoS computational biology 20100729 7

Large-scale genome sequencing gained general importance for life science because functional annotation of otherwise experimentally uncharacterized sequences is made possible by the theory of biomolecular sequence homology. Historically, the paradigm of similarity of protein sequences implying common structure, function and ancestry was generalized based on studies of globular domains. Having the same fold imposes strict conditions over the packing in the hydrophobic core requiring similarity of ...[more]

PMID: 20686689

Dataset Information

More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology.

Publications

More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

ECOD: identification of distant homology among multidomain and transmembrane domain proteins.
| S-EPMC6588880 | biostudies-literature

Differential bacterial surface display of peptides by the transmembrane domain of OmpA.
| S-EPMC2726941 | biostudies-literature

Transmembrane insertion of twin-arginine signal peptides is driven by TatC and regulated by TatB.
| S-EPMC3538955 | biostudies-literature

Transmembrane domain length of viral K+ channels is a signal for mitochondria targeting.
| S-EPMC2518832 | biostudies-literature

Homology modeling and molecular dynamics simulations of transmembrane domain structure of human neuronal nicotinic acetylcholine receptor.
| S-EPMC1305108 | biostudies-literature

Treating pancreatic cancer: more antioxidants more problems?
| S-EPMC6317716 | biostudies-literature

Transmembrane signaling and cytoplasmic signal conversion by dimeric transmembrane helix 2 and a linker domain of the DcuS sensor kinase.
| S-EPMC7857512 | biostudies-literature

PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.
| S-EPMC4987888 | biostudies-literature

Transmembrane helix orientation influences membrane binding of the intracellular juxtamembrane domain in Neu receptor peptides.
| S-EPMC3562775 | biostudies-literature

Single tryptophan and tyrosine comparisons in the N-terminal and C-terminal interface regions of transmembrane GWALP peptides.
| S-EPMC3934079 | biostudies-literature