Project description:Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify posttranslational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables Global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified nearly 2500 unique, high-confidence modified peptides comprising 31 different PTM types in single-pass database searches. Male C57BL/6J (B6) and CAST/EiJ (CAST) mice were purchased from The Jackson Laboratories (Bar Harbor, Maine) and housed in an environmentally controlled vivarium at the University of Wisconsin Biochemistry Department. Mice were provided standard rodent chow (Purina no. 5008) and water ad libitum, and maintained on a 12-hour light/dark cycle (6 AM – 6 PM). At 10 weeks of age, mice were sacrificed by CO2 asphyxiation. All animal procedures were preapproved by the University of Wisconsin Animal Care and Use Committee.
Project description:Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify posttranslational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables Global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified nearly 2500 unique, high-confidence modified peptides comprising 31 different PTM types in single-pass database searches.
Project description:Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify post-translational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified over 2200 unique, high-confidence modified peptides comprising 26 different PTM types in a single-pass database search.
Project description:Protein structure prediction and structural biology have entered a new era with an artificial intelligence-based approach encoded in the AlphaFold2 and the analogous RoseTTAfold methods. More than 200 million structures have been predicted by AlphaFold2 from their primary sequences and the models as well as the approach itself have naturally been examined from different points of view by experimentalists and bioinformaticians. Here, we assessed the degree to which these computational models can provide information on subtle structural details with potential implications for diverse applications in protein engineering and chemical biology and focused the attention on chalcogen bonds formed by disulphide bridges. We found that only 43% of the chalcogen bonds observed in the experimental structures are present in the computational models, suggesting that the accuracy of the computational models is, in the majority of the cases, insufficient to allow the detection of chalcogen bonds, according to the usual stereochemical criteria. High-resolution experimentally derived structures are therefore still necessary when the structure must be investigated in depth based on fine structural aspects.
Project description:As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].
Project description:The identification of protein biochemical functions based on their three-dimensional structures is strongly required in the post-genome-sequencing era. We have developed a new method to identify and predict protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials on the surfaces. Our prediction system consists of a similarity search method based on a clique search algorithm and the molecular surface database eF-site (electrostatic surface of functional-site in proteins). Using this system, functional sites similar to those of phosphoenoylpyruvate carboxy kinase were detected in several mononucleotide-binding proteins, which have different folds. We also applied our method to a hypothetical protein, MJ0226 from Methanococcus jannaschii, and detected the mononucleotide binding site from the similarity to other proteins having different folds.
Project description:Workflows capable of determining glycopeptides in large-scale are missing in the field of glycoproteomics. We present an approach for automated annotation of intact glycopeptide mass spectra. The steps in adopting the Mascot search engine for intact glycopeptide analysis included: (i) assigning one letter codes for monosaccharides, (ii) linearizing glycan sequences and (iii) preparing custom glycoprotein databases. Automated annotation of both N- and O-linked glycopeptides was proven using standard glycoproteins. In a large-scale study, a total of 257 glycoproteins containing 970 unique glycosylation sites and 3447 non-redundant N-linked glycopeptide variants were identified in 24 serum samples. Thus, a single tool was developed that collectively allows the (i) elucidation of N- and O-linked glycopeptide spectra, (ii) matching glycopeptides to known protein sequences, and (iii) high-throughput, batch-wise analysis of large-scale glycoproteomics data sets.
Project description:MotivationThe tumour microenvironment (TME) contains various cells including stromal fibroblasts, immune and malignant cells, and its composition can be elucidated using single-cell RNA sequencing (scRNA-seq). scRNA-seq datasets from several cancer types are available, yet we lack a comprehensive database to collect and present related TME data in an easily accessible format.ResultsWe therefore built a TME scRNA-seq database, and created the R package TMExplorer to facilitate investigation of the TME. TMExplorer provides an interface to easily access all available datasets and their metadata. The users can search for datasets using a thorough range of characteristics. The TMExplorer allows for examination of the TME using scRNA-seq in a way that is streamlined and allows for easy integration into already existing scRNA-seq analysis pipelines.
Project description:For bottom-up proteomics, there are wide variety of database-searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid-search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection--referred to as STEPS--utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true-positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.