Retention Time Prediction Using Neural Networks Increases Identifications in Crosslinking Mass Spectrometry
Ontology highlight
ABSTRACT: Abstract:
Crosslinking mass spectrometry (Crosslinking MS) has developed into a robust technique that is increasingly used to investigate the interactomes of organelles and cells. However, the incomplete and noisy information contained in spectra limits especially the identification of heteromeric protein-protein interactions (PPIs) from the many theoretically possible PPIs. We successfully leveraged here chromatographic retention time (RT) to complement the mass spectrometry-centric identification process. For this, we first made crosslinked peptides amenable to RT prediction, through a Siamese neural network, and then added RT information to the identification process. Our multi-task machine learning model xiRT achieved highly accurate predictions in a multi-dimensional separation experiment of crosslinked E. coli lysate conducted for this study. We combined strong cation exchange (SCX), hydrophilic strong anion exchange (hSAX) and reversed-phase (RP) chromatography and reached R^2 0.94 in RP and a margin of error of 1 fraction for hSAX in 94%, and SCX in 85% of the cases. Importantly, supplementing the search engine score with retention time features led to a 1.4-fold increase in PPIs, at 1% PPI false discovery rate (FDR). We also demonstrated the value of this approach for the more routine analysis of multiprotein complexes. In the Fanconi anaemia monoubiquitin ligase complex, an increase of 1.7-fold in heteromeric residue-pairs was achieved at 1% residue-pair FDR, solely using reversed-phase RT. Retention times therefore proved to be a powerful complement to mass spectrometric information to improve the identification of crosslinked peptides. We envision xiRT to supplement search engines in their scoring routines to increase the sensitivity of Crosslinking MS analyses especially for protein-protein interactions.
Conclusion:
Using a Siamese network architecture, we succeeded in bringing RT prediction into the Crosslinking MS field, independent of separation setup and search software. Our open source application xiRT introduces the concept of multi-task learning to achieve multi-dimensional chromatographic retention time prediction, and may use any peptide sequence-dependent measure including for example collision cross section or isoelectric point. The black-box character of the neural network was reduced by means of interpretable machine learning that revealed individual amino acid contributions towards the separation behavior. The RT predictions – even when using only the RP dimension – complement mass spectrometric information to enhance the identification of heteromeric crosslinks in multiprotein complex and proteome-wide studies. Overfitting does not account for this gain as known false target matches from an entrapment database did not increase. Leveraging additional information sources may help to address the mass-spectrometric identification challenge of heteromeric crosslinks.
ORGANISM(S): Escherichia Coli
SUBMITTER: Juri Rappsilber
PROVIDER: PXD020407 | JPOST Repository | Mon Apr 19 00:00:00 BST 2021
REPOSITORIES: jPOST
ACCESS DATA