ABSTRACT: Search Engine for Antimicrobial Resistance: a cloud compatible pipeline and web interface for rapidly detecting antimicrobial resistance genes directly from sequence data
Project description:Here we designed a search-engine for single-cell epigenome profiles. We tested different application of search-engine using different data-sets including mESC scATAC-seq profile.
Project description:Mass spectrometry is a central technique in glycomic analysis. However, there is no generic software tool for automated, confident analysis of tandem-mass spectrometry based glycomic data. Here, we propose GlycoNote – a generic and reliable search engine for tandem-mass spectrometry based glycomics. A false discovery rate analysis based on iterative decoy searching was specifically designed for glycomic data analysis. We apply GlycoNote to the analyses of distinct glycomic samples, including human milk oligosaccharides, N/O-glycome from human cell line and polysaccharides from plant. To further demonstrate the general utility of GlycoNote, automated analyses of nonnative glycomes (N-glycome labeled with aniline and permethylated N-glycome) or atypical glycans (O-glycome with N-acetylneuraminic acid / N-glycome from C. elegans) were performed. More importantly, an open-search mode was introduced for the elucidation of component heterogeneity in samples. GlycoNote could be an important tool in the rapidly growing efforts toward comprehensive glycomic analysis.
Project description:A cross linking mass spectrometry search engine was developed and implemented into Thermo Proteome Discoverer. The search engine is capable to handle several linker types as well as data input formats. To demonstrate its ability processing Bruker -ion mobility data, synthetic peptides (Beveridge, et. al., Nat. Commun., 2020, doi: 10.1038/s41467-020-14608-2) were analyzed and the respective files are availible here.
Project description:Processing of the dataset of synthetic phosphopeptides by Savitzki et al. (MCP, 2011) using multiple search engines. Establishment of the D-score: a search engine independent MD-score.
Project description:The standard platform for proteomics experiments today is mass spectrometry, particularly for samples derived from complex matrices. Recent increases in mass spectrometry sequencing speed, sensitivity and resolution now permit comprehensive coverage of even the most precious and limited samples, particularly when coupled with improvements in protein extraction techniques and chromatographic separation. However, the results obtained from laborious sample extraction and expensive instrumentation are often hindered by a sub optimal data processing pipelines. One critical data processing piece is peptide sequencing which is most commonly done through database search engines. In almost all MS/MS search engines users must limit their search space due to time constraints and q-value considerations. In nearly all experiments, the search is limited to a canonical database that typically does not reflect the individual genetic variations of the organism being studied. Searching for posttranslational modifications can exponentially increase the search space thus careful consideration must be used during the selection process. In addition, engines will nearly always assume the presence of only fully tryptic peptides. Despite these stringent parameters, proteomic data searches may take hours or even days to complete and opening even one of these criteria to more realistic biological settings will lead to detrimental increases in search time on expensive and custom data processing towers. Even on high performance servers, these search engines are computationally expensive, and most users decide to dial back their search parameters. We present Bolt, a new search engine that can search more than nine hundred thousand protein sequences (canonical, isoform, mutations, and contaminants) with 31 post translation modifications and N-terminal and C-terminal partial tryptic search in a matter of minutes on a standard configuration laptop. Along with increases in speed, Bolt provides an additional benefit of improvement in high confidence identifications, as demonstrated by manual validation of unique peptides identified by Bolt that were missed with parallel searching using standard engines. When in disagreement, 67% of peptides identified by Bolt may be manually validated by strong fragmentation patterns, compared to 14% of peptides uniquely identified by SEQUEST. Bolt represents, to the best of our knowledge, the first fully scalable, cloud based quantitative proteomic solution that can be operated within a user-friendly GUI interface.
Project description:This submission includes the raw data analyzed and search results described in our manuscript “Proteome-Scale Recombinant Standards And A Robust High-Speed Search Engine To Advance Cross-Linking MS-Based Interactomics”. In this study, we develop a strategy to generate a well-controlled XL-MS standard by systematically mixing and cross-linking recombinant proteins. The standard can be split into independent datasets, each of which has the MS2-level complexity of a typical proteome-wide XL-MS experiment. The raw datasets included in this submission were used to (1) guide the development of Scout, a machine learning-based search engine for XL-MS with MS-cleavable cross-linkers (batch 1), test different LC-MS acquisition methods (batch 2), and directly compare Scout to widely used XL-MS search engines (batches 3 and 4).