Project description:We present a statistical model to estimate the accuracy of derivatized heparin and heparan sulfate (HS) glycosaminoglycan (GAG) assignments to tandem mass (MS/MS) spectra made by the first published database search application, GAG-ID. Employing a multivariate expectation-maximization algorithm, this statistical model distinguishes correct from ambiguous and incorrect database search results when computing the probability that heparin/HS GAG assignments to spectra are correct based upon database search scores. Using GAG-ID search results for spectra generated from a defined mixture of 21 synthesized tetrasaccharide sequences as well as seven spectra of longer defined oligosaccharides, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly, ambiguously, and incorrectly assigned heparin/HS GAGs. This analysis makes it possible to filter large MS/MS database search results with predictable false identification error rates.
Project description:For bottom-up proteomics, there are wide variety of database-searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid-search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection--referred to as STEPS--utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true-positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.
Project description:In shotgun proteomics, the analysis of tandem mass spectrometry data from peptides can benefit greatly from high mass accuracy measurements. In this study, we have evaluated two database search strategies which use high mass accuracy measurements of the peptide precursor ion. Our results indicate that peptide identifications are improved when spectra are searched with a wide mass tolerance window and precursor mass is used as a filter to discard incorrect matches. Database searches with a peptide data set constrained to peptides within a narrow mass window resulted in fewer peptide identifications but a significantly faster database search time.
Project description:In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.
Project description:Cashew is one of the most prevalent causes of tree nut allergies. However, the cashew proteome is far from complete, which limits the quality of peptide identification in mass spectrometric analyses. In this study, bioinformatics tools were utilized to construct a customized cashew protein database and improve sequence quality for proteins of interest, based on a publicly available cashew genome database. As a result, two additional isoforms for cashew 2S albumins and five other isoforms for cashew 11S proteins were identified, along with several other potential allergens. Using the optimized protein database, the protein profiles of cashew nuts subjected to different oil-roasting conditions (138 °C and 166 °C for 2-10 minutes) were analyzed using discovery LC-MS/MS analysis. The results showed that cashew 2S protein is most heat-stable, followed by 11S and 7S proteins, though protein isoforms might be affected differently. Preliminary target peptide selection indicated that out of the 29 potential targets, 18 peptides were derived from the newly developed database. In the evaluation of thermal processing effects on cashew proteins, several Maillard reaction adducts were also identified. The cashew protein database developed in this study allows for comprehensive analyses of cashew proteome and development of high-quality allergen detection method.
Project description:Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyse tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral data sets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; and (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these data sets, MS-GF+ significantly increases the number of identified peptides compared with commonly used methods for peptide identifications. We emphasize that although MS-GF+ is not specifically designed for any particular experimental set-up, it improves on the performance of tools specifically designed for these applications (for example, specialized tools for phosphoproteomics).
Project description:RNA Polymerase II ChIP-chip using polyclonal antibody (N-20) performed on GM06990 cells for Nimblegen ENCODE arrays which comprise 50mer oligonucleotides spaces every 38bps (overlapping by 12nts). Goal was to identify Pol II-binding regions. Use of this data requires permission from its producers. Keywords: ChIP-chip
Project description:RNA Polymerase II ChIP-chip using polyclonal antibody (N-20) performed on HeLaS3 cells for Nimblegen ENCODE arrays which comprise 50mer oligonucleotides spaces every 38bps (overlapping by 12nts). Goal was to identify Pol II-binding regions. Use of this data requires permission from its producers. Keywords: ChIP-chip
Project description:Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum data set. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments.
Project description:Commercially available antibodies raised against condensin subunits have been widely used to characterise their cellular interactome. Here we have assessed the specificity of a polyclonal antibody (Bethyl A302-276A) that is commonly used as a probe for NCAPH2, the kleisin subunit of condensin II, in mammalian cells. We find that, in addition to its intended target, this antibody cross-reacts with one or more components of the SWI-SNF family of chromatin remodelling complexes in an NCAPH2-independent manner.