Dataset Information

Identification of cyclin protein using gradient boost decision tree algorithm.

ABSTRACT: Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.

SUBMITTER: Zulfiqar H

PROVIDER: S-EPMC8346528 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Identification of cyclin protein using gradient boost decision tree algorithm.

Zulfiqar Hasan H Yuan Shi-Shi SS Huang Qin-Lai QL Sun Zi-Jie ZJ Dao Fu-Ying FY Yu Xiao-Long XL Lin Hao H

Computational and structural biotechnology journal 20210719

Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin ...[more]

PMID: 34527186

Dataset Information

Identification of cyclin protein using gradient boost decision tree algorithm.

Publications

Identification of cyclin protein using gradient boost decision tree algorithm.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Gradient boosting decision-tree-based algorithm with neuroimaging for personalized treatment in depression.
| S-EPMC9873411 | biostudies-literature

A decision tree searching strategy to boost the identification of cross-linked peptides
2020-11-23 | PXD018291 | Pride

Gradient Boosting Decision Tree Algorithm for the Prediction of Postoperative Intraocular Lens Position in Cataract Surgery.
| S-EPMC7757635 | biostudies-literature

Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree.
| S-EPMC5549711 | biostudies-other

CUDT: a CUDA based decision tree algorithm.
| S-EPMC4130321 | biostudies-other

ECG-based prediction algorithm for imminent malignant ventricular arrhythmias using decision tree.
| S-EPMC7224460 | biostudies-literature

Validation and refinement of cropland data layer using a spatial-temporal decision tree algorithm.
| S-EPMC8891360 | biostudies-literature

Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study.
| S-EPMC7136841 | biostudies-literature

The clinical decision analysis using decision tree.
| S-EPMC4251295 | biostudies-literature

Not that kind of tree: Assessing the potential for decision tree-based plant identification using trait databases.
| S-EPMC7394705 | biostudies-literature