Dataset Information

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

ABSTRACT: Antimicrobial peptides (AMPs) are promising candidates in the fight against multidrug-resistant pathogens owing to AMPs' broad range of activities and low toxicity. Nonetheless, identification of AMPs through wet-lab experiments is still expensive and time consuming. Here, we propose an accurate computational method for AMP prediction by the random forest algorithm. The prediction model is based on the distribution patterns of amino acid properties along the sequence. Using our collection of large and diverse sets of AMP and non-AMP data (3268 and 166791 sequences, respectively), we evaluated 19 random forest classifiers with different positive:negative data ratios by 10-fold cross-validation. Our optimal model, AmPEP with the 1:3 data ratio, showed high accuracy (96%), Matthew's correlation coefficient (MCC) of 0.9, area under the receiver operating characteristic curve (AUC-ROC) of 0.99, and the Kappa statistic of 0.9. Descriptor analysis of AMP/non-AMP distributions by means of Pearson correlation coefficients revealed that reduced feature sets (from a full-featured set of 105 to a minimal-feature set of 23) can result in comparable performance in all respects except for some reductions in precision. Furthermore, AmPEP outperformed existing methods in terms of accuracy, MCC, and AUC-ROC when tested on benchmark datasets.

SUBMITTER: Bhadra P

PROVIDER: S-EPMC5785966 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

Bhadra Pratiti P Yan Jielu J Li Jinyan J Fong Simon S Siu Shirley W I SWI

Scientific reports 20180126 1

Antimicrobial peptides (AMPs) are promising candidates in the fight against multidrug-resistant pathogens owing to AMPs' broad range of activities and low toxicity. Nonetheless, identification of AMPs through wet-lab experiments is still expensive and time consuming. Here, we propose an accurate computational method for AMP prediction by the random forest algorithm. The prediction model is based on the distribution patterns of amino acid properties along the sequence. Using our collection of lar ...[more]

PMID: 29374199

Dataset Information

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

Publications

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

AIPpred: Sequence-Based Prediction of Anti-inflammatory Peptides Using Random Forest.
| S-EPMC5881105 | biostudies-literature

Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties.
| S-EPMC4290662 | biostudies-literature

Using the Random Forest for Identifying Key Physicochemical Properties of Amino Acids to Discriminate Anticancer and Non-Anticancer Peptides.
| S-EPMC10341712 | biostudies-literature

A Random-Forest Based Algorithm for Prediction of Enhancers From Histone Modifications
2012-05-10 | GSE37858 | GEO

A Random-Forest Based Algorithm for Prediction of Enhancers From Histone Modifications
2012-05-09 | E-GEOD-37858 | biostudies-arrayexpress

tRForest: a novel random forest-based algorithm for tRNA-derived fragment target prediction
2022-05-16 | GSE189510 | GEO

Antimicrobial Peptides Prediction method based on sequence multidimensional feature embedding
| S-EPMC9714691 | biostudies-literature

Deciphering optimal molecular determinants of non-hemolytic, cell-penetrating antimicrobial peptides through bioinformatics and Random Forest.
| S-EPMC11839508 | biostudies-literature

Prediction of antibacterial activity from physicochemical properties of antimicrobial peptides.
| S-EPMC3237455 | biostudies-literature

Prediction of donor splice sites using random forest with a new sequence encoding approach.
| S-EPMC4724119 | biostudies-literature