Comparative study of the data analysis tools for data-independent acquisition mass spectrometry
Ontology highlight
ABSTRACT: In-house generated datasets of different gradient lengths in comparing the performance of five DIA-MS analysis tools (OpenSWATH, EncyclopeDIA, Skyline, DIA-NN, Spectronaut).
Project description:The consistent and accurate quantification of proteins is a challenging task for mass spectrometry (MS)-based proteomics. SWATH-MS uses data-independent acquisition (DIA) for label-free quantification. Here we evaluated five software tools for processing SWATH-MS data: OpenSWATH, SWATH2.0, Skyline, Spectronaut, DIA-Umpire, in collaboration with the respective developers to ensure an optimal use of each tool. We analyzed data from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments applying different SWATH isolation windows setups. Using the resulting high-complexity datasets we benchmarked precision and accuracy of quantification and evaluated identification performance, robustness and specificity of each software tool. To consistently evaluate the high complexity datasets, we developed the LFQbench R-package. LFQbench results enabled developers to improve their software tools, thereby underlining the value of the reference datasets for software development and benchmarking. All tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics.
Project description:Reanalysis of the Kawashima et al dataset with DIA-NN and Spectronaut. By using different settings this shows how the CVs can be altered by the software parameters.
Project description:Generation of a new library of targeted mass spectrometry assays for accurate protein quantification in triple negative breast cancer (TNBC) tissues. Primary tumor tissue lysates from 105 TNBC patients treated at Masaryk Memorial Cancer Institute (MMCI) in Brno, Czech Republic, were used to generate the spectral library. This project covers raw files from data-dependent acquisition (DDA) – parallel accumulation-serial fragmentation (PASEF) measurements of 12 hydrophilic chromatography (HILIC) fractions of aliquot pool from complete set of 105 samples measured on timsTOF Pro; raw files of 16 individual samples measured in data-independent acquisition (DIA) – PASEF mode and used for hybrid library generation and for demonstrative quantitative DIA data extraction; Pulsar archive generated in Spectronaut 16.0 from 12 DDA-PASEF measurements of HILIC fractions and from 16 data-independent acquisition DIA-PASEF measurements of individual samples. The 16 DIA-PASEF runs of individual samples used for library generation were analyzed using newest versions of Spectronaut (version 18.5) and DIA-NN (version 1.8.1) software tools in library-based setting using the newly generated library as well as in library-free setting showing library-based method to outperform the use of predicted libraries in the terms of identification numbers.
Project description:Due to the technical advances of mass spectrometers especially the high scanning speed and high MS/MS resolution, the data independent acquisition mass spectrometry (DIA-MS) began to be widely used, which enables high reproducibility in both proteomic identification and quantification. The current DIA-MS methods normally cover a wide mass range, with the aim to target and identify as many peptides and proteins as possible and therefore regularly generates MS/MS spectra of high complexity. In this report, we assessed the performance and benefits of using small windows with e.g. 5-Da width across the peptide elution time. We also devised a new DIA method named RTwinDIA that schedules the small isolation windows in different retention time blocks. We applied Maxquant and pFind database searching tools, and further used Spectronaut with an external comprehensive spectral library of human proteins to perform the direct proteomic identification. We conclude that softwares like pFind have potential in directly analyzing DIA data acquired with small windows, and that the instrumental time and DIA cycle time should be preferably spend on small windows rather than on covering a broad mass range by large windows, to improve the proteome coverage for new biological samples and to increase the quantitative precision. These results further provide perspectives for the future emerge between DDA and DIA on faster MS analyzers.
Project description:Quantitative cross-linking/mass spectrometry (QCLMS) provides increasing structural detail on altered protein states in solution. Accurate quantitation is a value in itself but may also be central to elucidating small differences between protein states. Hence, QCLMS could benefit from data independent acquisition (DIA) which generally provides higher reproducibility than data dependent acquisition (DDA) and higher throughput than targeted methods. Therefore we here open DIA to QCLMS by extending a widely used DIA software, Spectronaut to now also accommodate cross-link data. A mixture of seven proteins cross-linked with bis[sulfosuccinimidyl] suberate (BS3) was used to evaluate this workflow. Out of the 414 identified unique residue pairs, 292 (70%) were quantifiable across triplicates with a coefficient of variation (CV) of 9.8%, with manual correction of peak selection and boundaries for PSMs in the lower quartile of individual CV values. This compares favourably to DDA where we previously quantified only 63% of the identified cross-links across triplicates with a CV of 14%, for a single protein and complete manual data curation. DIA QCLMS is promising to detect differential abundance of cross-linked peptides in complex mixtures despite the encountered ratio compression when increasing sample complexity through the addition of E. coli cell lysate as matrix. In conclusion, DIA software Spectronaut can now be used in cross-linking and DIA is indeed able to improve QCLMS.
Project description:With a multitude of DIA approaches developed recently, many of their features are not yet well explored. In this study we make use of a biologically relevant sample, the synaptosome, as a model to compare the spectral libraries generated by (1) DDA solely on Orbitrap (OT), (2) Orbitrap for MS1 scan and ion trap for MS/MS acquisition (IT), and (3) directDIA derived from DIA data. Next, we compared the number of spectral library features recovered, and quality of quantification, by DIA and WiSIM-DIA, and suggest future improvement.
Project description:Here, we present a standardized, “off-the-shelf” proteomics pipeline working in a single 96-well plate to achieve deep coverage of cellular proteomes with high throughput and scalability. This integrated pipeline streamlining a fully automated sample preparation platform, data independent acquisition (DIA) coupled with high field asymmetric waveform ion mobility spectrometer (FAIMS) interface, and an optimized library-free DIA database search strategy. Our systematic evaluation of FAIMS-DIA showed single compensation voltage (CV) at -35V not only yields deepest proteome coverage but also best correlates with DIA without FAIMS. Our in-depth comparison of direct-DIA database search engines showed Spectronaut outperforms others, providing highest quantifiable proteins. Next, we apply three common DIA strategies in characterizing human induced pluripotent stem cell (iPSC)-derived neurons and show single-shot MS using single CV(-35V)-FAIMS-DIA results in >9,000 quantifiable proteins with < 10% missing values, as well as superior reproducibility and accuracy compared to other existing DIA methods.
Project description:CRISPR-Cas gene editing holds substantial promise in many biomedical disciplines and basic research. Due to the important functional implications of non-histone chromosomal protein HMG-14 (HMGN1) in regulating chromatin structure and tumor immunity, we performed gene knockout of HMGN1 by CRISPR in cancer cells and studied the following proteomic regulation events. In particular we utilized DIA mass spectrometry (DIA-MS) and reproducibly measured more than 6200 proteins (protein- FDR 1%) and more than 82,000 peptide precursors in the single MS shots of two hours. HMGN1 protein deletion was confidently verified in all of the clone- and dish- replicates following CRISPR by DIA-MS. Statistical analysis revealed 144 proteins changed their expressions significantly after HMGN1 knockout. Functional annotation and enrichment analysis indicate the deletion of HMGN1 induces the histone inactivation, various stress pathways, remodeling of extracellular proteomes, and immune regulation processes related to complement and coagulation cascade and interferon alpha/ gamma response in cancer cells. These results shed new lights on the cellular functions of HMGN1. We suggest that DIA-MS can be reliably used as a rapid, robust, and cost-effective proteomic-screening tool to assess the outcome of the CRISPR experiments.
Project description:In order to compare workflows for acquisition and treatment of proteomic data analyzed in Data Independent Acquisition (DIA) mode, a proteomic standard has been generated by spike-in the 48 human proteins of UPS1 (Sigma) in a whole cell extract of E.coli at 8 different concentrations ranging from 0.1 to 50 fmol of UPS1/ug of E.coli. Each sample has been trypsin-digested analyzed in triplicate on an Orbitrap Fusion instrument (Thermo) operating in DIA mode with four different sizes of precursor windows (narrow, wide, mixed or overlapped). These 4 x 24 raw files have then been analyzed with 6 different DIA softwares (Spectronaut, ScaffoldDIA, Skyline, DIA-Umpire, OpenSWATH and DIA-NN) with the use or not of a fractionated E.coli library. Here we deposit: the 96 Thermo raw files of the analysis as well as the corresponding converted .mzML and mzXML files; the 49 DDA .raw files for the spectral library, composed of the 48 fractions of E.coli whole cell extract + 200 fmol/ug of UPS1 proteins; the spectral library generated by the DDA analysis (.blib and .tsv files generated with Skyline, .tsv file from Spectronaut); the spectral library generated with the fasta file in Prosit; the spectral library generated by the 24 Narrow DIA analysis and the fasta file in DIA-NN and MSFragger (DIA-Umpire SE module); the Fasta file; the software tool files (Spectronaut .sne files, Skyline .sky files and ScaffoldDIA .sdia files); the raw outputs of the tools and the post-processed precursors quantification tables (normalized, imputed missing values), for the 4 acquisition modes (5 with the use of a peptide library and 4 with a search against Human + E.coli fasta files).