Project description:Shotgun proteomics provides the most powerful analytical platform for global inventory of complex proteomes using liquid chromatography-tandem mass spectrometry (LC-MS/MS) and allows a global analysis of protein changes. Nevertheless, sampling of complex proteomes by current shotgun proteomics platforms is incomplete, and this contributes to variability in assessment of peptide and protein inventories by spectral counting approaches. Thus, shotgun proteomics data pose challenges in comparing proteomes from different biological states. We developed an analysis strategy using quasi-likelihood Generalized Linear Modeling (GLM), included in a graphical interface software package (QuasiTel) that reads standard output from protein assemblies created by IDPicker, an HTML-based user interface to query shotgun proteomic data sets. This approach was compared to four other statistical analysis strategies: Student t test, Wilcoxon rank test, Fisher's Exact test, and Poisson-based GLM. We analyzed the performance of these tests to identify differences in protein levels based on spectral counts in a shotgun data set in which equimolar amounts of 48 human proteins were spiked at different levels into whole yeast lysates. Both GLM approaches and the Fisher Exact test performed adequately, each with their unique limitations. We subsequently compared the proteomes of normal tonsil epithelium and HNSCC using this approach and identified 86 proteins with differential spectral counts between normal tonsil epithelium and HNSCC. We selected 18 proteins from this comparison for verification of protein levels between the individual normal and tumor tissues using liquid chromatography-multiple reaction monitoring mass spectrometry (LC-MRM-MS). This analysis confirmed the magnitude and direction of the protein expression differences in all 6 proteins for which reliable data could be obtained. Our analysis demonstrates that shotgun proteomic data sets from different tissue phenotypes are sufficiently rich in quantitative information and that statistically significant differences in proteins spectral counts reflect the underlying biology of the samples.
Project description:Spectral counting has become a commonly used approach for measuring protein abundance in label-free shotgun proteomics. At the same time, the development of data analysis methods has lagged behind. Currently most studies utilizing spectral counts rely on simple data transforms and posthoc corrections of conventional signal-to-noise ratio statistics. However, these adjustments can neither handle the bias toward high abundance proteins nor deal with the drawbacks due to the limited number of replicates. We present a novel statistical framework (QSpec) for the significance analysis of differential expression with extensions to a variety of experimental design factors and adjustments for protein properties. Using synthetic and real experimental data sets, we show that the proposed method outperforms conventional statistical methods that search for differential expression for individual proteins. We illustrate the flexibility of the model by analyzing a data set with a complicated experimental design involving cellular localization and time course.
Project description:A new result report for Mascot search results is described. A greedy set cover algorithm is used to create a minimal set of proteins, which is then grouped into families on the basis of shared peptide matches. Protein families with multiple members are represented by dendrograms, generated by hierarchical clustering using the score of the nonshared peptide matches as a distance metric. The peptide matches to the proteins in a family can be compared side by side to assess the experimental evidence for each protein. If the evidence for a particular family member is considered inadequate, the dendrogram can be cut to reduce the number of distinct family members.
Project description:A major scientific challenge at the present time for cancer research is the determination of the underlying biological basis for cancer development. It is further complicated by the heterogeneity of cancer's origin. Understanding the molecular basis of cancer requires studying the dynamic and spatial interactions among proteins in cells, signaling events among cancer cells, and interactions between the cancer cells and the tumor microenvironment. Recently, it has been proposed that large-scale protein expression analysis of cancer cell proteomes promises to be valuable for investigating mechanisms of cancer transformation. Advances in mass spectrometry technologies and bioinformatics tools provide a tremendous opportunity to qualitatively and quantitatively interrogate dynamic protein-protein interactions and differential regulation of cellular signaling pathways associated with tumor development. In this review, progress in shotgun proteomics technologies for examining the molecular basis of cancer development will be presented and discussed.
Project description:Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).
Project description:BackgroundSpectral counting methods provide an easy means of identifying proteins with differing abundances between complex mixtures using shotgun proteomics data. The crux spectral-counts command, implemented as part of the Crux software toolkit, implements four previously reported spectral counting methods, the spectral index (SI(N)), the exponentially modified protein abundance index (emPAI), the normalized spectral abundance factor (NSAF), and the distributed normalized spectral abundance factor (dNSAF).ResultsWe compared the reproducibility and the linearity relative to each protein's abundance of the four spectral counting metrics. Our analysis suggests that NSAF yields the most reproducible counts across technical and biological replicates, and both SI(N) and NSAF achieve the best linearity.ConclusionsWith the crux spectral-counts command, Crux provides open-source modular methods to analyze mass spectrometry data for identifying and now quantifying peptides and proteins. The C++ source code, compiled binaries, spectra and sequence databases are available at http://noble.gs.washington.edu/proj/crux-spectral-counts.
Project description:We describe an improved version of the data-independent acquisition (DIA) computational analysis tool DIA-Umpire, and show that it enables highly sensitive, untargeted, and direct (spectral library-free) analysis of DIA data obtained using the Orbitrap family of mass spectrometers. DIA-Umpire v2 implements an improved feature detection algorithm with two additional filters based on the isotope pattern and fractional peptide mass analysis. The targeted re-extraction step of DIA-Umpire is updated with an improved scoring function and a more robust, semiparametric mixture modeling of the resulting scores for computing posterior probabilities of correct peptide identification in a targeted setting. Using two publicly available Q Exactive DIA datasets generated using HEK-293 cells and human liver microtissues, we demonstrate that DIA-Umpire can identify similar number of peptide ions, but with better identification reproducibility between replicates and samples, as with conventional data-dependent acquisition. We further demonstrate the utility of DIA-Umpire using a series of Orbitrap Fusion DIA experiments with HeLa cell lysates profiled using conventional data-dependent acquisition and using DIA with different isolation window widths.
Project description:Osteosarcoma is a primary malignant bone tumor affecting adolescents and young adults. This study aimed to identify proteomic signatures that distinguish between different osteosarcoma subtypes, providing insights into their molecular heterogeneity and potential implications for personalized treatment approaches. Using advanced proteomic techniques, we analyzed FFPE tumor samples from a cohort of pediatric osteosarcoma patients representing four various subtypes. Differential expression analysis revealed a significant proteomic signature that discriminated between these subtypes, highlighting distinct molecular profiles associated with different tumor characteristics. In contrast, clinical determinants did not correlate with the proteome signature of pediatric osteosarcoma. The identified proteomics signature encompassed a diverse array of proteins involved in focal adhesion, ECM-receptor interaction, PI3K-Akt signaling pathways, and proteoglycans in cancer, among the top enriched pathways. These findings underscore the importance of considering the molecular heterogeneity of osteosarcoma during diagnosis or even when developing personalized treatment strategies. By identifying subtype-specific proteomics signatures, clinicians may be able to tailor therapy regimens to individual patients, optimizing treatment efficacy and minimizing adverse effects.
Project description:Protein assembly and biological interpretation of the assembled protein lists are critical steps in shotgun proteomics data analysis. Although most biological functions arise from interactions among proteins, current protein assembly pipelines treat proteins as independent entities. Usually, only individual proteins with strong experimental evidence, that is, confident proteins, are reported, whereas many possible proteins of biological interest are eliminated. We have developed a clique-enrichment approach (CEA) to rescue eliminated proteins by incorporating the relationship among proteins as embedded in a protein interaction network. In several data sets tested, CEA increased protein identification by 8-23% with an estimated accuracy of 85%. Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones. Applying CEA on a breast cancer data set, rescued proteins coded by well-known breast cancer genes. In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.