Proteomics

Dataset Information

0

A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data


ABSTRACT: A core computational challenge in the analysis of mass spectrometry data is the de novo sequencing problem, in which the generating amino acid sequence is inferred directly from an observed fragmentation spectrum without the use of a sequence database. Recently, deep learning models have made significant advances in de novo sequencing by learning from massive datasets of high confidence labeled mass spectra. However, these methods are primarily designed for data-dependent acquisition (DDA) experiments. Over the past decade, the field of mass spectrometry has been moving toward using data-independent acquisition (DIA) protocols for the analysis of complex proteomic samples due to their superior specificity and reproducibility. Hence, we present a new de novo sequencing model called Cascadia, which uses a transformer architecture to handle the more complex data generated by DIA protocols. In comparisons with existing approaches for de novo sequencing of DIA data, Cascadia achieves improved performance across a range of instruments and experimental protocols. Additionally, we demonstrate Cascadia’s ability to accurately discover de novo coding variants and peptides from the variable region of antibodies.

ORGANISM(S): Homo Sapiens Mus Musculus

SUBMITTER: Michael MacCoss  

PROVIDER: PXD053291 | panorama | Fri Jun 21 00:00:00 BST 2024

REPOSITORIES: PanoramaPublic

Similar Datasets

2018-05-16 | MSV000082368 | MassIVE
2023-01-16 | PXD037803 | Pride
2024-10-10 | PXD050548 | Pride
2016-01-12 | PXD003317 | Pride
2020-10-20 | MSV000086336 | MassIVE
2022-01-06 | PXD027589 | Pride
2024-02-01 | MSV000093979 | MassIVE
2011-09-01 | E-GEOD-26514 | biostudies-arrayexpress
2014-01-16 | MSV000078530 | MassIVE
2018-11-25 | E-MTAB-7351 | biostudies-arrayexpress