Proteomics

Dataset Information

0

Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy


ABSTRACT: Top-down mass spectrometry (MS) is a powerful tool for identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. While the technique is powerful, it suffered from the complex dataset generated from top-down MS experiments, which requires sequential data processing steps for data interpretation. Deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes is a critical step in the data processing process. Multiple algorithms are currently available to deconvolute top-down mass spectra; however, each algorithm generates different deconvoluted peak lists with varied accuracy comparing to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. The random forest model outperformed the single best algorithm. This machine learning strategy could enhance the accuracy and confidence in protein identification during database search by accelerating detection of true positive peaks while filtering out false positive peaks. Thus, this method showed promises in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics.

INSTRUMENT(S): Bruker Daltonics solarix series

ORGANISM(S): Macaca Mulatta (rhesus Macaque)

TISSUE(S): Skeletal Muscle Fiber

SUBMITTER: Zhijie Wu  

LAB HEAD: Sean J McIlwain

PROVIDER: PXD018043 | Pride | 2020-05-06

REPOSITORIES: pride

altmetric image

Publications

Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy.

McIlwain Sean J SJ   Wu Zhijie Z   Wetzel Molly M   Belongia Daniel D   Jin Yutong Y   Wenger Kent K   Ong Irene M IM   Ge Ying Y  

Journal of the American Society for Mass Spectrometry 20200408 5


Top-down mass spectrometry (MS) is a powerful tool for the identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. However, the complex data set generated from top-down MS experiments requires multiple sequential data processing steps to successfully interpret the data for identifying and characterizing proteoforms. One critical step is the deconvolution of the complex isotopic distribution that  ...[more]

Similar Datasets

2022-05-17 | GSE203061 | GEO
2021-09-09 | PXD020615 | Pride
2008-04-04 | E-TABM-255 | biostudies-arrayexpress
2019-07-18 | GSE134056 | GEO
2019-07-18 | GSE134052 | GEO
| 46649 | ecrin-mdr-crc
2024-12-09 | BIOMD0000001067 | BioModels
2022-09-13 | PXD018996 | Pride
2021-02-02 | GSE162164 | GEO
2007-05-11 | E-TABM-125 | biostudies-arrayexpress