Project description:Proteomics is presently dominated by the "bottom-up" strategy, in which proteins are enzymatically digested into peptides for mass spectrometric identification. Although this approach is highly effective at identifying large numbers of proteins present in complex samples, the digestion into peptides renders it impossible to identify the proteoforms from which they were derived. We present here a powerful new strategy for the identification of proteoforms and the elucidation of proteoform families (groups of related proteoforms) from the experimental determination of the accurate proteoform mass and number of lysine residues contained. Accurate proteoform masses are determined by standard LC-MS analysis of undigested protein mixtures in an Orbitrap mass spectrometer, and the lysine count is determined using the NeuCode isotopic tagging method. We demonstrate the approach in analysis of the yeast proteome, revealing 8637 unique proteoforms and 1178 proteoform families. The elucidation of proteoforms and proteoform families afforded here provides an unprecedented new perspective upon proteome complexity and dynamics.
Project description:Intact mass analyses (HPLC-ESI-MS) of GELFREE separated fractions of yeast. A total of 66 raw files, representing two replicate analyses.
Project description:This data set contains the following:
1) 66 raw files for NeuCode-labeled human proteoforms from the Jurkat cell line analyzed via "intact-mass" (i.e., LC-MS, with no precursor fragmentation). Three biological replicates were performed (files from the different replicates are labeled with 022317, 031617, or 031817, respectively). For each replicate, proteoforms were separated offline using a 12% Tris-acetate Gelfree cartridge and 11 fractions were collected. Two technical replicate LC-MS injections of each fraction were performed, yielding a total of 66 raw files (3 biological replicates x 11 fractions x 2 injections).
2) 22 raw files for label-free human proteoforms from the Jurkat cell line analyzed via "top-down" (i.e., LC-MS/MS, with precursor fragmentation). One biological replicate was performed (labeled 032017). Proteoforms were separated offline using a 12% Tris-acetate Gelfree cartridge and 11 fractions were collected. Two technical replicate LC-MS/MS injections of each fraction were performed, yielding a total of 22 raw files (1 biological replicate x 11 fractions x 2 injections).
3) Multi-protease and trypsin-only pruned G-PTM-D databases used for proteoform identification. Note that the bottom-up data used to generate these databases can be found elsewhere on MassIVE (MSV000083304).
Project description:A proteoform family is a group of related molecular forms of a protein (proteoforms) derived from the same gene. We have previously described a strategy to identify proteoforms and elucidate proteoform families in complex mixtures of intact proteins. The strategy is based upon measurements of two properties for each proteoform: (i) the accurate proteoform intact-mass, measured by liquid chromatography/mass spectrometry (LC-MS), and (ii) the number of lysine residues in each proteoform, determined using an isotopic labeling approach. These measured properties are then compared with those extracted from a catalog of theoretical proteoforms containing protein sequences and localized post-translational modifications (PTMs) for the organism under study. A match between the measured properties and those in the catalog constitutes an identification of the proteoform. In the present study, this strategy is extended by utilizing a global PTM discovery database and is applied to the widely studied model organism Escherichia coli, providing the most comprehensive elucidation of E. coli proteoforms and proteoform families to date.
Project description:Raw top-down LC-MS/MS files from 12% GELFREE separated fractions of yeast. Raw files for 2 technical replicates of each fraction, in total 24 raw files.
Project description:We present an open-source, interactive program named Proteoform Suite that uses proteoform mass and intensity measurements from complex biological samples to identify and quantify proteoforms. It constructs families of proteoforms derived from the same gene, assesses proteoform function using gene ontology (GO) analysis, and enables visualization of quantified proteoform families and their changes. It is applied here to reveal systemic proteoform variations in the yeast response to salt stress.
Project description:In top-down proteomics, intact proteins are analyzed by tandem mass spectrometry and proteoforms, which are defined forms of a protein with specific sequences of amino acids and localized post-translational modifications, are identified using precursor mass and fragmentation data. Many proteoforms that are detected in the precursor scan (MS1) are not selected for fragmentation by the instrument and therefore remain unidentified in typical top-down proteomic workflows. Our laboratory has developed the open source software program Proteoform Suite to analyze MS1-only intact proteoform data. Here, we have adapted it to provide identifications of proteoform masses in precursor MS1 spectra of top-down data, supplementing the top-down identifications obtained using the MS2 fragmentation data. Proteoform Suite performs mass calibration using high-scoring top-down identifications and identifies additional proteoforms using calibrated, accurate intact masses. Proteoform families, the set of proteoforms from a given gene, are constructed and visualized from proteoforms identified by both top-down and intact-mass analyses. Using this strategy, we constructed proteoform families and identified 1861 proteoforms in yeast lysate, yielding an approximately 40% increase over the original 1291 proteoform identifications observed using traditional top-down analysis alone.
Project description:Primary diploid cells exit the cell cycle in response to exogenous stress or oncogene activation through a process known as cellular senescence. This cell-autonomous tumor-suppressive mechanism is also a major mechanism operative in organismal aging. To date, temporal aspects of senescence remain understudied. Therefore, we use quantitative proteomics to investigate changes following forced HRASG12V expression and induction of senescence across 1 week in normal diploid fibroblasts. We demonstrate that global intracellular proteomic changes correlate with the emergence of the senescence-associated secretory phenotype and the switch to robust cell cycle exit. The senescence secretome reinforces cell cycle exit, yet is largely detrimental to tissue homeostasis. Previous studies of secretomes rely on ELISA, bottom-up proteomics or RNA-seq. To date, no study to date has examined the proteoform complexity of secretomes to elucidate isoform-specific, post-translational modifications or regulated cleavage of signal peptides. Therefore, we use a quantitative top-down proteomics approach to define the molecular complexity of secreted proteins <30 kDa. We identify multiple forms of immune regulators with known activities and affinities such as distinct forms of interleukin-8, as well as GRO? and HMGA1, and temporally resolve secreted proteoform dynamics. Together, our work demonstrates the complexity of the secretome past individual protein accessions and provides motivation for further proteoform-resolved measurements of the secretome.