Project description:The presence of numerous chemical contaminants from industrial, agricultural, and pharmaceutical sources in water supplies poses a potential risk to human and ecological health. Current chemical analyses suffer from limitations including chemical coverage and high cost, and broad-coverage in vitro assays such as transcriptomics may further improve water quality monitoring by assessing a large range of possible effects. Here, we used high-throughput transcriptomics to assess the activity induced by field-derived water extracts in MCF7 breast carcinoma cells.
Project description:Protein turnover is vital for cellular functioning and is often associated with the pathophysiology of a variety of diseases. Metabolic labeling with heavy water followed by liquid chromatography coupled to mass spectrometry is a powerful tool to study in vivo protein turnover in high throughput and large scale. Heavy water is a cost-effective and easy to use labeling agent. It labels all nonessential amino acids. Due to its toxicity in high concentrations (20% or higher), small enrichments (8% or smaller) of heavy water are used with most organisms. The low concentration results in incomplete labeling of peptides/proteins. Therefore, the data processing is more challenging and requires accurate quantification of labeled and unlabeled forms of a peptide from overlapping mass isotopomer distributions. The work describes the bioinformatics aspects of the analysis of heavy water labeled mass spectral data, available software tools and current challenges and opportunities.
Project description:The identification of peptides and proteins by LC-MS/MS requires the use of bioinformatics. Tools developed in the Tabb Laboratory contribute significant flexibility and discrimination to this process. The Bumbershoot tools (MyriMatch, DirecTag, TagRecon, and Pepitome) enable the identification of peptides represented by MS/MS scans. All of these tools can work directly from instrument capture files of multiple vendors, such as Thermo RAW format, or from standard XML-based formats, such as mzML or mzXML. Peptide identifications are written to mzIdentML or pepXML format. Protein assembly is handled by the IDPicker algorithm. Raw identifications are filtered to a confident set by use of the target-decoy strategy. IDPicker arranges large sets of input files into a hierarchy for reporting, and the software applies a parsimony algorithm to report the smallest possible number of proteins to explain the observed peptides. This protocol details the use of these tools for new users.
Project description:To assess the impact of surface water across the Hun River, several sampling sites located in the mainstream and the tributary were selected representative of pollution gradient and different pollution source. Human mesenchymal stem cells were exposed to organic extracts of surface water from six sites for 2 days. Microarrays were used to measure the gene expression. And the gene expression profiles were used to evaluate the ability of determine the potential biological effects, to differentiate different pollution source, and to identify the toxic components.
Project description:In liquid chromatography-mass spectrometry (LC-MS), parts of LC peaks are often corrupted by their co-eluting peptides, which results in increased quantification variance. In this paper, we propose to apply accurate LC peak boundary detection to remove the corrupted part of LC peaks. Accurate LC peak boundary detection is achieved by checking the consistency of intensity patterns within peptide elution time ranges. In addition, we remove peptides with erroneous mass assignment through model fitness check, which compares observed intensity patterns to theoretically constructed ones. The proposed algorithm can significantly improve the accuracy and precision of peptide ratio measurements.
Project description:BackgroundUntargeted metabolomics datasets contain large proportions of uninformative features that can impede subsequent statistical analysis such as biomarker discovery and metabolic pathway analysis. Thus, there is a need for versatile and data-adaptive methods for filtering data prior to investigating the underlying biological phenomena. Here, we propose a data-adaptive pipeline for filtering metabolomics data that are generated by liquid chromatography-mass spectrometry (LC-MS) platforms. Our data-adaptive pipeline includes novel methods for filtering features based on blank samples, proportions of missing values, and estimated intra-class correlation coefficients.ResultsUsing metabolomics datasets that were generated in our laboratory from samples of human blood, as well as two public LC-MS datasets, we compared our data-adaptive filtering method with traditional methods that rely on non-method specific thresholds. The data-adaptive approach outperformed traditional approaches in terms of removing noisy features and retaining high quality, biologically informative ones. The R code for running the data-adaptive filtering method is provided at https://github.com/courtneyschiffman/Metabolomics-Filtering .ConclusionsOur proposed data-adaptive filtering pipeline is intuitive and effectively removes uninformative features from untargeted metabolomics datasets. It is particularly relevant for interrogation of biological phenomena in data derived from complex matrices associated with biospecimens.
Project description:The accurate processing of complex liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) data from biological samples is a major challenge for metabolomics, proteomics, and related approaches. Here, we present the pipelines and systems for threshold-avoiding quantification (PASTAQ) LC-MS/MS preprocessing toolset, which allows highly accurate quantification of data-dependent acquisition LC-MS/MS datasets. PASTAQ performs compound quantification using single-stage (MS1) data and implements novel algorithms for high-performance and accurate quantification, retention time alignment, feature detection, and linking annotations from multiple identification engines. PASTAQ offers straightforward parameterization and automatic generation of quality control plots for data and preprocessing assessment. This design results in smaller variance when analyzing replicates of proteomes mixed with known ratios and allows the detection of peptides over a larger dynamic concentration range compared to widely used proteomics preprocessing tools. The performance of the pipeline is also demonstrated in a biological human serum dataset for the identification of gender-related proteins.