Dataset Information

Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure.

ABSTRACT: BACKGROUND:The present knowledge of protein structures at atomic level derives from some 60,000 molecules. Yet the exponential ever growing set of hypothetical protein sequences comprises some 10 million chains and this makes the problem of protein structure prediction one of the challenging goals of bioinformatics. In this context, the protein representation with contact maps is an intermediate step of fold recognition and constitutes the input of contact map predictors. However contact map representations require fast and reliable methods to reconstruct the specific folding of the protein backbone. METHODS:In this paper, by adopting a GRID technology, our algorithm for 3D reconstruction FT-COMAR is benchmarked on a huge set of non redundant proteins (1716) taking random noise into consideration and this makes our computation the largest ever performed for the task at hand. RESULTS:We can observe the effects of introducing random noise on 3D reconstruction and derive some considerations useful for future implementations. The dimension of the protein set allows also statistical considerations after grouping per SCOP structural classes. CONCLUSIONS:All together our data indicate that the quality of 3D reconstruction is unaffected by deleting up to an average 75% of the real contacts while only few percentage of randomly generated contacts in place of non-contacts are sufficient to hamper 3D reconstruction.

SUBMITTER: Vassura M

PROVIDER: S-EPMC3033854 | biostudies-literature | 2011 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure.

Vassura Marco M Di Lena Pietro P Margara Luciano L Mirto Maria M Aloisio Giovanni G Fariselli Piero P Casadio Rita R

BioData mining 20110113 1

<h4>Background</h4>The present knowledge of protein structures at atomic level derives from some 60,000 molecules. Yet the exponential ever growing set of hypothetical protein sequences comprises some 10 million chains and this makes the problem of protein structure prediction one of the challenging goals of bioinformatics. In this context, the protein representation with contact maps is an intermediate step of fold recognition and constitutes the input of contact map predictors. However contact ...[more]

PMID: 21232136

Similar Datasets

Project description:BackgroundThe development of chromosomal conformation capture techniques, particularly, the Hi-C technique, has made the analysis and study of the spatial conformation of a genome an important topic in bioinformatics and computational biology. Aided by high-throughput next generation sequencing techniques, the Hi-C technique can generate genome-wide, large-scale intra- and inter-chromosomal interaction data capable of describing in details the spatial interactions within a genome. These data can be used to reconstruct 3D structures of chromosomes that can be used to study DNA replication, gene regulation, genome interaction, genome folding, and genome function.ResultsHere, we introduce a maximum likelihood algorithm called 3DMax to construct the 3D structure of a chromosome from Hi-C data. 3DMax employs a maximum likelihood approach to infer the 3D structures of a chromosome, while automatically re-estimating the conversion factor (α) for converting Interaction Frequency (IF) to distance. Our results show that the models generated by 3DMax from a simulated Hi-C dataset match the true models better than most of the existing methods. 3DMax is more robust to structural variability and noise. Compared on a real Hi-C dataset, 3DMax constructs chromosomal models that fit the data better than most methods, and it is faster than all other methods. The models reconstructed by 3DMax were consistent with fluorescent in situ hybridization (FISH) experiments and existing knowledge about the organization of human chromosomes, such as chromosome compartmentalization.Conclusions3DMax is an effective approach to reconstructing 3D chromosomal models. The results, and the models generated for the simulated and real Hi-C datasets are available here: http://sysbio.rnet.missouri.edu/bdm_download/3DMax/ . The source code is available here: https://github.com/BDM-Lab/3DMax . A short video demonstrating how to use 3DMax can be found here: https://youtu.be/ehQUFWoHwfo .

Dataset Information

Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure.

Publications

Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets