Project description:Observational, Multicenter, Post-market, Minimal risk, Prospective data collection of PillCam SB3 videos (including PillCam reports) and raw data files and optional collection of Eneteroscopy reports
Project description:Abstract Here we present IPSA, an innovative web-based spectrum annotator that visualizes and characterizes peptide tandem mass spectra. A tool for the scientific community, IPSA can visualize peptides collected using a wide variety of experimental and instrumental configurations. Annotated spectra are customizable via a selection of interactive features and can be exported as editable scalable vector graphics to aid in the production of publication-quality figures. Single spectra can be analyzed through provided forms, while multiple peptide spectral matches can be uploaded directly to the server as CSV, MGF, or mzML files. IPSA facilitates the characterization of experimental MS/MS performance through the optional export of fragment ion statistics from one to many peptide spectral matches. This resource is made freely accessible at http://interactivepeptidespectralannotator.com, and the source code is available for inspection at https://github.com/dbrademan/IPSA-dev for custom implementations. This repository contains the raw data, sequence databases, and peptide identifications utilized in the manuscript.
Project description:We developed a method that allows measuring the stable carbon isotope composition of individual species in microbial communities using metaproteomics. We call this methods “Direct Protein-SIF”. To benchmark this method, we measured twenty pure culture species using the Direct Protein-SIF method as well as Isotope Ratio Mass Spectrometry. Some of the pure cultures were measured in technical replicates to see how consistent Protein-SIF measurements are between mass spec runs. This submission thus contains 29 raw files for the pure cultures. See table in the submission for details of which species was measured for which .raw file. We also included the Direct Protein-SIF specific isotope pattern files as well as the .mzML files and PSM files required as input for the Direct Protein-SIF software. In addition to the pure culture a protein reference material (MKH files) was measured. The respective .raw files and isotopic pattern files are also included in this submission (see publication for details on how the reference material is used to calibrate the method).
Project description:We describe "Aird", an opensource and computation-oriented format with controllable precision, flexible indexing strategies, and high compression rate. Aird provides a novel compressor called Zlib-Diff-PforDelta (ZDPD) for m/z data. Compared with Zlib only, m/z data size is about 55% lower in Aird on average. With the high-speed decoding and encoding performance brought by the Single Instruction Multiple Data(SIMD) technology used in the ZDPD, Aird merely takes 33% decoding time compared with Zlib. We used the open dataset HYE, which contains 48 raw files from SCIEX TripleTOF 5600 and TripleTOF6600. The total file size is 206GB as the vendor format. The total size increases to 854GB after converting to mzML with 32-bit encoding precision. While it takes only 189GB when using Aird. Aird uses JavaScript Object Notation (JSON) for metadata storage. Aird-SDK is written in Java and AirdPro is a GUI client for vendor file converting which is written in C#. They are freely available at https://github.com/CSi-Studio/Aird-SDK and https://github.com/CSi-Studio/AirdPro.
Project description:This dataset consists of 44 raw MS files, comprising 27 DIA (SWATH) and 15 DDA runs on a TripleTOF 5600 and of two raw mass spectrometry files acquired on a Q Exactive. The composition of the dataset is described in the manuscript by Tsou et al., titled: "DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics", Nature Methods, in press Raw files are deposited here in ProteomeXchange and are associated with the DIA-Umpire processed data. All DIA-Umpire processed results for each sample together with DDA results are deposited in separated folders. Also see the "DataSampleID.xlsx" associated with this Readme file. Internal reference from the Gingras lab ProHits implementation: Project 94, Export version VS2 (Tsou_DIA-Umpire)
Project description:<p>Urine metabolomics is widely used for biomarker research in the fields of medicine and toxicology. As a consequence, characterization of the variations of the urine metabolome under basal conditions becomes critical in order to avoid confounding effects in cohort studies. Such physiological information is however very scarce in the literature and in metabolomics databases so far. Here we studied the influence of age, body mass index (BMI), and gender on metabolite concentrations in a large cohort of 183 adults by using liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS). We implemented a comprehensive statistical workflow for univariate hypothesis testing and modeling by orthogonal partial least-squares (OPLS).</p><p> This repository contains the data set from the negative ionization mode: 2 batches, 234 files (24 blanks + 26 QCs + 184 samples) in the Thermo .RAW (6.8 Go) and .mzML (18 Go) formats. The comprehensive analysis of this data set is publicly available on the Workflow4metabolomics.org e-infrastructure with two reference histories: 'W4M00002_Sacurine-comprehensive' corresponds to the preprocessing of the .mzML files, followed by signal drift and batch effect correction, normalization, filtering, statistics, and annotation of the peak table; 'W4M00001_Sacurine- statistics' starts with the peak table restricted to the 113 identified metabolites (see Roux et al. [1] for a full description and information about the annotation), and contains the statistical analysis (as described in associated publication except that the publication also describes the positive ionization mode). The intensities of the table provided in the m_sacurine.txt ISA file correspond to the peak table restricted to the 113 identified metabolites (i.e. are identical to the input of the 'W4M00001_Sacurine-statistics' history). Note that in both histories, the HU_096 sample is filtered out during the Hotelling/Quantile/MissingValue quality control sample filter, leading to 183 samples for the subsequent statistical analyzes. Notes: The 'sampling' field indicates the 9 successive weeks during which samples were collected. The 'subset' field indicates a subset of 36 files (6 blanks + 10 QCs + 20 samples) which still contain significant physiological variations (and can be used as e.g. demo or teaching material).</p><p> Acknowledgements: The authors are grateful to Philippe Rocca-Serra for his help in preparing the ISA files.</p><p><br></p><p> References:</p><p> [1] Roux A, Xu Y, Heilier JF, Olivier MF, Ezan E, Tabet JC, Junot C. 2012. Annotation of the Human Adult Urinary Metabolome and Metabolite Identification Using Ultra High Performance Liquid Chromatography Coupled to a Linear Quadrupole Ion Trap-Orbitrap Mass Spectrometer. Anal Chem. Aug 7;84(15):6429-37. doi: 10.1021/ac300829f.</p>
Project description:A collection of HEK293 SWATH-MS raw data files generated as part of the routine operation of the ProCan experimental facility. These files represent technical replicates each being an aliquot from the same pooled digest, with fifteen runs collected from each of the six Sciex Triple TOF 6600 mass spectrometers, giving a total of 90 raw data files.
Project description:This project contains raw data, intermediate files and results used to create the integrated map of protein expression in human cancer (including data from cell lines and tumours). The map is based on joint reanalysis of 11 large-scale quantitative proteomics studies. The datasets were primarily retrieved from the PRIDE database, as well as MassIVE database and CPTAC data portal. The raw files were manually curated in order to capture mass spectrometry acquisition parameters, experimental design and sample characteristics. The raw files were jointly processed with MaxQuant computational platform using standard settings (see Data Processing Protocol). Due to size of the data, the processing was done in two batches denoted as “celllines” and “tumours” analysis. In total, using a 1% peptide spectrum match and protein false discovery rates, the analysis allowed identification of 21,580 protein groups in the cell lines dataset (MQ search results available in ‘txt-celllines’ folder), and 13,441 protein groups in the tumours dataset (MQ search results available in ‘txt-tumours’ folder).
Project description:Genome-wide DNA Methylation Data from Illumina HumanMethylationEPIC arrays for whole blood samples from 570 healthy individuals. Raw IDAT files are available for a subset of 403 samples on EGA. Raw data (IDAT files) and associated phenotype information are available for all individuals included in this study (n=570) directly from CIBMTR. Data are available under controlled access release upon reasonable request and execution of a data use agreement. Requests should be submitted to CIBMTR at info-request@mcw.edu and include the study reference IB17-04.