Dataset Information

Ranking near-native candidate protein structures via random forest classification.

ABSTRACT: BACKGROUND:In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. RESULTS:To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. CONCLUSIONS:In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.

SUBMITTER: Wu H

PROVIDER: S-EPMC6929337 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Ranking near-native candidate protein structures via random forest classification.

Wu Hongjie H Huang Hongmei H Lu Weizhong W Fu Qiming Q Ding Yijie Y Qiu Jing J Li Haiou H

BMC bioinformatics 20191224 Suppl 25

<h4>Background</h4>In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or t ...[more]

PMID: 31874596

Similar Datasets

Project description:BACKGROUND: Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. RESULTS: Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes. CONCLUSIONS: With the inclusion of interfacial water, the IFACEwat improves mostly results of the initial rigid docking, especially for Antigen/Antibody complexes. The improvement is achieved by explicitly taking into account the contribution of water during the protein interactions, which was ignored or not fully presented by the initial rigid docking and other re-ranking techniques. In addition, the IFACEwat maintains sufficient computational efficiency of the initial docking algorithm, yet improves the ranks as well as the number of the near native structures found. As our implementation so far targeted to improve the results of ZDOCK3.0.2, and particularly for the Antigen/Antibody complexes, it is expected in the near future that more implementations will be conducted to be applicable for other initial rigid docking algorithms.

Project description:The inner surface of the retina contains a complex mixture of neurons, glia, and vasculature, including retinal ganglion cells (RGCs), the final output neurons of the retina and primary neurons that are damaged in several blinding diseases. The goal of the current work was two-fold: to assess the feasibility of using computer-assisted detection of nuclei and random forest classification to automate the quantification of RGCs in hematoxylin/eosin (H&E)-stained retinal whole-mounts; and if possible, to use the approach to examine how nuclear size influences disease susceptibility among RGC populations. To achieve this, data from RetFM-J, a semi-automated ImageJ-based module that detects, counts, and collects quantitative data on nuclei of H&E-stained whole-mounted retinas, were used in conjunction with a manually curated set of images to train a random forest classifier. To test performance, computer-derived outputs were compared to previously published features of several well-characterized mouse models of ophthalmic disease and their controls: normal C57BL/6J mice; Jun-sufficient and Jun-deficient mice subjected to controlled optic nerve crush (CONC); and DBA/2J mice with naturally occurring glaucoma. The result of these efforts was development of RetFM-Class, a command-line-based tool that uses data output from RetFM-J to perform random forest classification of cell type. Comparative testing revealed that manual and automated classifications by RetFM-Class correlated well, with 83.2% classification accuracy for RGCs. Automated characterization of C57BL/6J retinas predicted 54,642 RGCs per normal retina, and identified a 48.3% Jun-dependent loss of cells at 35 days post CONC and a 71.2% loss of RGCs among 16-month-old DBA/2J mice with glaucoma. Output from automated analyses was used to compare nuclear area among large numbers of RGCs from DBA/2J mice (n = 127,361). In aged DBA/2J mice with glaucoma, RetFM-Class detected a decrease in median and mean nucleus size of cells classified into the RGC category, as did an independent confirmation study using manual measurements of nuclear area demarcated by BRN3A-immunoreactivity. In conclusion, we have demonstrated that histology-based random forest classification is feasible and can be utilized to study RGCs in a high-throughput fashion. Despite having some limitations, this approach demonstrated a significant association between the size of the RGC nucleus and the DBA/2J form of glaucoma.

Dataset Information

Ranking near-native candidate protein structures via random forest classification.

Publications

Ranking near-native candidate protein structures via random forest classification.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets