Unknown

Dataset Information

0

Identifying antimicrobial peptides using word embedding with deep recurrent neural networks.


ABSTRACT: MOTIVATION:Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences' low complexity and high variance, which frustrates sequence similarity-based searches. RESULTS:Here we use word embeddings of protein sequences to represent bacteriocins, and apply a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriocins from protein sequences without using sequence similarity. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to peptide and protein classification problems for which sequence similarity cannot be used. AVAILABILITY AND IMPLEMENTATION:Data and source code for this project are freely available at: https://github.com/nafizh/NeuBI. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Hamid MN 

PROVIDER: S-EPMC6581433 | biostudies-literature | 2019 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identifying antimicrobial peptides using word embedding with deep recurrent neural networks.

Hamid Md-Nafiz MN   Friedberg Iddo I  

Bioinformatics (Oxford, England) 20190601 12


<h4>Motivation</h4>Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences' low complexity and high variance, which frustrates sequence similarity-based searches.<h4>Results</h4>H  ...[more]

Similar Datasets

| S-EPMC6612824 | biostudies-other
| S-EPMC6805893 | biostudies-literature
| S-EPMC7797176 | biostudies-literature
| S-EPMC7931900 | biostudies-literature
| S-EPMC7256371 | biostudies-literature
| S-EPMC7813825 | biostudies-literature
| S-EPMC11317817 | biostudies-literature
| S-EPMC10909209 | biostudies-literature
| S-EPMC8944797 | biostudies-literature
| S-EPMC8075191 | biostudies-literature