Dataset Information

Optimized data analysis avoiding trypsin artefacts

ABSTRACT: Most bottom-up proteomics experiments share two features: The use of trypsin to digest proteins for mass spectrometry and the statistic driven matching of the measured peptide fragment spectra against protein database derived in silico generated spectra. While this extremely powerful approach in combination with latest generation mass spectrometers facilitates very deep proteome coverage, the assumptions made have to be met to generate meaningful results. One of these assumptions is that the measured spectra indeed have a match in the search space, since the search engine will always report the best match. However, one of the most abundant proteins in the sample, the protease, is often not represented in the employed database. It is therefore widely accepted in the community to include the protease and other common contaminants in the database to avoid false positive matches. Although this approach accounts for unmodified trypsin peptides, the most widely employed trypsin preparations are chemically modified to prevent autolysis and premature activity loss of the protease. In this study we observed numerous spectra of modified trypsin derived peptides in samples from our laboratory as well as in datasets downloaded from public repositories. In many cases the spectra were assigned to other proteins, often with good statistical significance. We therefore designed a new database search strategy employing an artificial amino acid which accounts for these peptides with a minimal increase in search space and the concomitant loss of statistical significance. Moreover, this approach can be easily implemented into existing workflows for many widely used search engines.

INSTRUMENT(S): LTQ Orbitrap Velos

ORGANISM(S): Saccharomyces Cerevisiae (baker's Yeast)

SUBMITTER: Katarina Fritz

LAB HEAD: Ruth Birner-Gruenberger

PROVIDER: PXD002726 | Pride | 2016-04-15

REPOSITORIES: pride

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	KFSwYeastCrTry_150506.fasta	Fasta
	Modified_trypsin_peptides_Yeast_In-gel_digest_1-3.mzid.gz	Mzid
	Modified_trypsin_peptides_Yeast_In-gel_digest_1-3.pride.mztab.gz	Mztab
	Yeast_In-gel_digest_1.mgf	Mgf
	Yeast_In-gel_digest_1.pride.mgf.gz	Mgf

Items per page:

1 - 5 of 18

Publications

Cleaning out the Litterbox of Proteomic Scientists' Favorite Pet: Optimized Data Analysis Avoiding Trypsin Artifacts.

Schittmayer Matthias M Fritz Katarina K Liesinger Laura L Griss Johannes J Birner-Gruenberger Ruth R

Journal of proteome research 20160322 4

Chemically modified trypsin is a standard reagent in proteomics experiments but is usually not considered in database searches. Modification of trypsin is supposed to protect the protease against autolysis and the resulting loss of activity. Here, we show that modified trypsin is still subject to self-digestion, and, as a result, modified trypsin-derived peptides are present in standard digests. We depict that these peptides commonly lead to false-positive assignments even if native trypsin is c ...[more]

PMID: 26938934

Similar Datasets

Project description:Here we present the data obtained from a label-free quantitative proteomics analysis of soluble spinal cord extract derived from a mouse model of multiple sclerosis (EAE) and sham-induced mice. Samples were prepared offline using the FASP approach and then submitted for nano-LC-MS/MS analysis on an Orbitrap Velos instrument. After statistical evaluation of the data, 431 differentially expressed proteins (KS-test, p < 0.05) out of a total of ~1400 unique proteins were identified in the comparative spinal cord analysis (peptide FDR=0.55%).Database search and protein identification: Tandem mass spectra were extracted from .RAW files and searched using the SEQUEST-PVM v.27 (rev.9) (Eng et al., 1994) database program against a Mouse protein database downloaded as FASTA-formatted sequences from EBI-IPI (database version 3.72) which contains 56957 entries (with priority given to UniProt identifiers) as well as reverse decoy sequences to empirically assess the false identification rate. This search program was executed on a cluster computer to match the MS/MS spectra to the corresponding most highly correlated peptide sequences Mass tolerances for precursor (MS) and product ions (MS/MS) were set to 3 and 0 m/z, respectively. Searches were performed with the enzyme selectivity set to trypsin with one missed cleavage allowed and protein modifications included fixed carbamidomethylation of cysteines (57 Da). Match likelihoods were assigned a statistical confidence score using the STATQUEST probabilistic model (Kislinger et al., 2003) and candidate peptide identifications were filtered using an estimated peptide confidence score of ≥95%. A 10 ppm high accuracy mass filter accounting for isotopic shifts in the spectra was applied post-SEQUEST analysis thus improving the fidelity of protein identifications. Protein quantitation: To estimate relative protein levels, spectral counts were transformed into normalised spectral abundance factors (NSAF) as previously described (Mosley et al., 2009). Briefly, this involves dividing the spectral count (SC) of a protein by its length (Mw) and finally normalises this value to the sum of all SC/Mw.

Project description:Data analysis. Spot detection and matching were performed with a comparative cross analysis of all the gels using DeCyder software v.6.5 (GE Healthcare). 178 spots were selected based on 1.15-fold for protein ratio cut-off, allowing for the appearance of the spots in 23 out of 28 gels (69 out of 84 total images). Data from 95 spots were submitted. 93 spots were identified with high confidence. Spot picking and Trypsin digestion. The spots of interest were picked up by Ettan Spot Picker (GE Healthcare) based on the in-gel analysis and spot picking design by DeCyder software. The gel spots were washed a few times then digested in-gel with modified porcine trypsin protease (Promega, Fitchburg, WI). The digested tryptic peptides were desalted using a Zip-tip C18 (Millipore, Billerica, MA). Peptides were eluted from the Zip-tip with 0.5 uL of matrix solution (alpha-cyano-4-hydroxycinnamic acid, 5 mg/mL in 50% acetonitrile, 0.1% trifluoroacetic acid, 25mM ammonium bicarbonate) and spotted on a MALDI plate. Mass Spectrometry. MALDI-TOF MS and TOF/TOF tandem MS/MS were performed on AB SCIEX TOF/TOF 5800 System (AB SCIEX). MALDI-TOF mass spectra were acquired in reflectron positive ion mode, averaging 4000 laser shots per spectrum. TOF/TOF tandem MS fragmentation spectra were acquired for each sample, averaging 4000 laser shots per fragmentation spectrum on each of the 7-10 most abundant ions present in each sample (excluding trypsin autolytic peptides and other known background ions). Database search. Both the resulting peptide mass and the associated fragmentation spectra were submitted to GPS Explorer workstation equipped with MASCOT search engine (Matrix Science, Boston, MA) to search the Swiss-Prot database. Searches were performed without constraining protein molecular weight or isoelectric point, with variable carbamidomethylation of cysteine and oxidation of methionine residues, and with one missed cleavage also allowed in the search parameters. Candidates with either protein score C.I.% or Ion C.I.% greater than 95 were considered significant. When multiple IDs were significant for a given spot, the selection was made by evaluating apparent molecular weight, isoelectric point, the location of the spot in the gel, and the presence of strips of multiple protein isoforms in the adjacent spots.

Project description:The continuing ‘desire’ in obtaining more quantitative and detailed information from cellular proteomics experiments in regards to proteome coverage and protein modifications asks for a systematic investigation of the protein extraction and digestion protocols, in particular when working with unicellular organisms with a strong cell wall such as the model yeast S. cerevisiae. The selection of a suitable sample preparation workflow is crucial to obtain in-depth proteome coverage, as the use of certain sample preparation protocols may bias the protein identification or induce (unexpected) peptide modifications. Here, we made an extensive comparison of preparation workflows commonly applied to S. cerevisiae. Workflows were compared on the basis of identified MS/MS spectra, peptide sequences and number and type of modifications using both restricted (Peaks) and unrestricted (TagGraph) database search approaches. The proteome coverage was mainly affected by the sample collection approach, while it was maximized using a FASP. Extensive reagent specific peptide modifications were detected when using formic acid, but also when using acetone. Such artefacts split the analyte mass signals and generate additional chemical noise that may also elute differently compared to the native peptide. The use of both an unrestricted and restricted database search increased identification rates significantly and resulted in the identification of approximately 70% of the MS2 spectra for the best protocol. The unidentified spectra were assessed by their de novo sequencing score. This confirmed that those spectra consisted by a majority of very low quality spectra, insufficient to match to database sequences. However, a small fraction of the unidentified spectra showed high quality, which presumably derive from unknown sequence variants not present in the database. This study demonstrates the high importance of the sample preparation workflow and the obtained results will guide researchers in the field to optimize sample preparation procedures.