Unknown

Dataset Information

0

Amyloidogenic motifs revealed by n-gram analysis.


ABSTRACT: Amyloids are proteins associated with several clinical disorders, including Alzheimer's, and Creutzfeldt-Jakob's. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form ?-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis .

SUBMITTER: Burdukiewicz M 

PROVIDER: S-EPMC5636826 | biostudies-other | 2017 Oct

REPOSITORIES: biostudies-other

altmetric image

Publications

Amyloidogenic motifs revealed by n-gram analysis.

Burdukiewicz Michał M   Sobczyk Piotr P   Rödiger Stefan S   Duda-Madej Anna A   Mackiewicz Paweł P   Kotulska Małgorzata M  

Scientific reports 20171011 1


Amyloids are proteins associated with several clinical disorders, including Alzheimer's, and Creutzfeldt-Jakob's. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 re  ...[more]

Similar Datasets

| S-EPMC3697852 | biostudies-literature
| S-EPMC4143503 | biostudies-literature
| S-EPMC6484272 | biostudies-literature
| S-EPMC3174529 | biostudies-literature
| S-EPMC6213860 | biostudies-other
2022-05-11 | GSE202575 | GEO
2018-09-30 | E-MTAB-6876 | biostudies-arrayexpress
| S-EPMC1779570 | biostudies-literature
| S-EPMC4094474 | biostudies-literature
| S-EPMC7950326 | biostudies-literature