Unknown

Dataset Information

0

Identify High-Quality Protein Structural Models by Enhanced K-Means.


ABSTRACT: Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means.

SUBMITTER: Wu H 

PROVIDER: S-EPMC5381204 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identify High-Quality Protein Structural Models by Enhanced <i>K</i>-Means.

Wu Hongjie H   Li Haiou H   Jiang Min M   Chen Cheng C   Lv Qiang Q   Wu Chuang C  

BioMed research international 20170322


<i>Background.</i> One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. <i>Results.</i> Here, we propose  ...[more]

Similar Datasets

| S-EPMC5167671 | biostudies-literature
| S-EPMC4147910 | biostudies-literature
| S-EPMC8728224 | biostudies-literature
| S-EPMC4999177 | biostudies-literature
| S-EPMC3695499 | biostudies-literature
| S-EPMC3089637 | biostudies-other
| S-EPMC5804564 | biostudies-literature
| S-EPMC1401231 | biostudies-literature
| S-EPMC308794 | biostudies-literature
| S-EPMC2885375 | biostudies-literature