Dataset Information

A data-mining approach for multiple structural alignment of proteins.

ABSTRACT: Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.

SUBMITTER: Siu WY

PROVIDER: S-EPMC2951672 | biostudies-literature | 2010 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A data-mining approach for multiple structural alignment of proteins.

Siu Wing-Yan WY Mamoulis Nikos N Yiu Siu-Ming SM Chan Ho-Leung HL

Bioinformation 20100228 8

Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a dat ...[more]

PMID: 21079664

Similar Datasets

Project description:Peptides have recently regained interest as therapeutic candidates, but their development remains confronted with several limitations including low bioavailability. Backbone head-to-tail cyclization, i.e., setting a covalent peptide bond linking the last amino acid with the first one, is one effective strategy of peptide-based drug design to stabilize the conformation of bioactive peptides while preserving peptide properties in terms of low toxicity, binding affinity, target selectivity, and preventing enzymatic degradation. Starting from an active peptide, it usually requires the design of a linker of a few amino acids to make it possible to cyclize the peptide, possibly preserving the conformation of the initial peptide and not affecting its activity. However, very little is known about the sequence-structure relationship requirements of designing linkers for peptide cyclization in a rational manner. Recently, we have shown that large-scale data-mining of available protein structures can lead to the precise identification of protein loop conformations, even from remote structural classes. Here, we transpose this approach to linkers, allowing head-to-tail peptide cyclization. First we show that given a linker sequence and the conformation of the linear peptide, it is possible to accurately predict the cyclized peptide conformation. Second, and more importantly, we show that it seems possible to elaborate on the information inferred from protein structures to propose effective candidate linker sequences constrained by length and amino acid composition, providing the first framework for the rational design of head-to-tail cyclization linkers. Finally, we illustrate this for two peptides using a limited set of amino-acids likely not to interfere with peptide function. For a linear peptide derived from Nrf2, the peptide cyclized starting from the experimental structure showed a 26-fold increase in the binding affinity. For urotensin II, a peptide already cyclized by a disulfide bond that exerts a broad array of biological activities, we were able, starting from models of the structure, to design a head-to-tail cyclized peptide, the first synthesized bicyclic 14-residue long urotensin II analogue, showing a retention of in vitro activity. Although preliminary, our results strongly suggest that such an approach has strong potential for cyclic peptide-based drug design.

Dataset Information

A data-mining approach for multiple structural alignment of proteins.

Publications

A data-mining approach for multiple structural alignment of proteins.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets