MetaLab: an automated pipeline for metaproteomic data analysis.
Ontology highlight
ABSTRACT: Research involving microbial ecosystems has drawn increasing attention in recent years. Studying microbe-microbe, host-microbe, and environment-microbe interactions are essential for the understanding of microbial ecosystems. Currently, metaproteomics provide qualitative and quantitative information of proteins, providing insights into the functional changes of microbial communities. However, computational analysis of large-scale data generated in metaproteomic studies remains a challenge. Conventional proteomic software have difficulties dealing with the extreme complexity and species diversity present in microbiome samples leading to lower rates of peptide and protein identification. To address this issue, we previously developed the MetaPro-IQ approach for highly efficient microbial protein/peptide identification and quantification.Here, we developed an integrated software platform, named MetaLab, providing a complete and automated, user-friendly pipeline for fast microbial protein identification, quantification, as well as taxonomic profiling, directly from mass spectrometry raw data. Spectral clustering adopted in the pre-processing step dramatically improved the speed of peptide identification from database searches. Quantitative information of identified peptides was used for estimating the relative abundance of taxa at all phylogenetic ranks. Taxonomy result files exported by MetaLab are fully compatible with widely used metagenomics tools. Herein, the potential of MetaLab is evaluated by reanalyzing a metaproteomic dataset from mouse gut microbiome samples.MetaLab is a fully automatic software platform enabling an integrated data-processing pipeline for metaproteomics. The function of sample-specific database generation can be very advantageous for searching peptides against huge protein databases. It provides a seamless connection between peptide determination and taxonomic profiling; therefore, the peptide abundance is readily used for measuring the microbial variations. MetaLab is designed as a versatile, efficient, and easy-to-use tool which can greatly simplify the procedure of metaproteomic data analysis for researchers in microbiome studies.
<h4>Background</h4>Research involving microbial ecosystems has drawn increasing attention in recent years. Studying microbe-microbe, host-microbe, and environment-microbe interactions are essential for the understanding of microbial ecosystems. Currently, metaproteomics provide qualitative and quantitative information of proteins, providing insights into the functional changes of microbial communities. However, computational analysis of large-scale data generated in metaproteomic studies remains ...[more]
Project description:Quick and accurate identification of microbial pathogens is essential for both diagnosis and response to emerging infectious diseases. The advent of next-generation sequencing technology offers an unprecedented platform for rapid sequencing-based identification of novel viruses. We have developed a customized bioinformatics data analysis pipeline, VirusHunter, for the analysis of Roche/454 and other long read Next generation sequencing platform data. To illustrate the utility of VirusHunter, we performed Roche/454 GS FLX titanium sequencing on two unclassified virus isolates from the World Reference Center for Emerging Viruses and Arboviruses (WRCEVA). VirusHunter identified sequences derived from a novel bunyavirus and a novel reovirus in the two samples respectively. Further sequence analysis demonstrated that the viruses were novel members of the Phlebovirus and Orbivirus genera. Both Phlebovirus and Orbivirus genera include many economic important viruses or serious human pathogens.
Project description:Targeted sequencing of genomic regions is a cost- and time-efficient approach for screening patient cohorts. We present a fast and efficient workflow to analyze highly imbalanced, targeted next-generation sequencing data generated using molecular inversion probe (MIP) capture. Our Snakemake pipeline performs sample demultiplexing, overlap paired-end merging, alignment, MIP-arm trimming, variant calling, coverage analysis and report generation. Further, we support the analysis of probes specifically designed to capture certain structural variants and can assign sex using Y-chromosome-unique probes. In a user-friendly HTML report, we summarize all these results including covered, incomplete or missing regions, called variants and their predicted effects. We developed and tested our pipeline using the hemophilia A & B MIP design from the "My Life, Our Future" initiative. HemoMIPs is available as an open-source tool on GitHub at: https://github.com/kircherlab/hemoMIPs.
Project description:Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
Project description:Non-targeted analysis is nowadays applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. Conventional processing strategies for GC-MS data include baseline correction, feature detection, and retention time alignment before multivariate modeling. These techniques can be prone to errors and therefore time-consuming manual corrections are generally necessary. We introduce here a novel fully automated approach to non-targeted GC-MS data processing. This new approach avoids feature extraction and retention time alignment. Supervised machine learning on decomposed tensors of segmented chromatographic raw data signal is used to rank regions in the chromatograms contributing to differentiation between sample classes. The performance of this novel data analysis approach is demonstrated on three published datasets.
Project description:Single-molecule binding assays enable the study of how molecular machines assemble and function. Current algorithms can identify and locate individual molecules, but require tedious manual validation of each spot. Moreover, no solution for high-throughput analysis of single-molecule binding data exists. Here, we describe an automated pipeline to analyze single-molecule data over a wide range of experimental conditions. In addition, our method enables state estimation on multivariate Gaussian signals. We validate our approach using simulated data, and benchmark the pipeline by measuring the binding properties of the well-studied, DNA-guided DNA endonuclease, TtAgo, an Argonaute protein from the Eubacterium Thermus thermophilus. We also use the pipeline to extend our understanding of TtAgo by measuring the protein's binding kinetics at physiological temperatures and for target DNAs containing multiple, adjacent binding sites.
Project description:We present a fully automatic pipeline for the analysis of PET data on the cortical surface. Our pipeline combines tools from FreeSurfer and PETPVC, and consists of (i) co-registration of PET and T1-w MRI (T1) images, (ii) intensity normalization, (iii) partial volume correction, (iv) robust projection of the PET signal onto the subject's cortical surface, (v) spatial normalization to a template, and (vi) atlas statistics. We evaluated the performance of the proposed workflow by performing group comparisons and showed that the approach was able to identify the areas of hypometabolism characteristic of different dementia syndromes: Alzheimer's disease (AD) and both the semantic and logopenic variants of primary progressive aphasia. We also showed that these results were comparable to those obtained with a standard volume-based approach. We then performed individual classifications and showed that vertices can be used as features to differentiate cognitively normal and AD subjects. This pipeline is integrated into Clinica, an open-source software platform for neuroscience studies available at www.clinica.run.
Project description:The application of small angle X-ray scattering (SAXS) to the structural characterization of transmembrane proteins (MPs) in detergent solutions has become a routine procedure at synchrotron BioSAXS beamlines around the world. SAXS provides overall parameters and low resolution shapes of solubilized MPs, but is also meaningfully employed in hybrid modeling procedures that combine scattering data with information provided by high-resolution techniques (eg. macromolecular crystallography, nuclear magnetic resonance and cryo-electron microscopy). Structural modeling of MPs from SAXS data is non-trivial, and the necessary computational procedures require further formalization and facilitation. We propose an automated pipeline integrated with the laboratory-information management system ISPyB, aimed at preliminary SAXS analysis and the first-step reconstruction of MPs in detergent solutions, in order to streamline high-throughput studies, especially at synchrotron beamlines. The pipeline queries an ISPyB database for available a priori information via dedicated services, estimates model-free SAXS parameters and generates preliminary models utilizing either ab initio, high-resolution-based, or mixed/hybrid methods. The results of the automated analysis can be inspected online using the standard ISPyB interface and the estimated modeling parameters may be utilized for further in-depth modeling beyond the pipeline. Examples of the pipeline results for the modelling of the tetrameric alpha-helical membrane channel Aquaporin0 and mechanosensitive channel T2, solubilized by n-Dodecyl β-D-maltoside are presented. We demonstrate how increasing the amount of a priori information improves model resolution and enables deeper insights into the molecular structure of protein-detergent complexes.
Project description:Quantitative susceptibility mapping (QSM) is an MRI-based, computational method for anatomically localizing and measuring concentrations of specific biomarkers in tissue such as iron. Growing research suggests QSM is a viable method for evaluating the impact of iron overload in neurological disorders and on cognitive performance in aging. Several software toolboxes are currently available to reconstruct QSM maps from 3D GRE MR Images. However, few if any software packages currently exist that offer fully automated pipelines for QSM-based data analyses: from DICOM images to region-of-interest (ROI) based QSM values. Even less QSM-based software exist that offer quality control measures for evaluating the QSM output. Here, we address these gaps in the field by introducing and demonstrating the reliability and external validity of Ironsmith; an open-source, fully automated pipeline for creating and processing QSM maps, extracting QSM values from subcortical and cortical brain regions (89 ROIs) and evaluating the quality of QSM data using SNR measures and assessment of outlier regions on phase images. Ironsmith also features automatic filtering of QSM outlier values and precise CSF-only QSM reference masks that minimize partial volume effects. Testing of Ironsmith revealed excellent intra- and inter-rater reliability. Finally, external validity of Ironsmith was demonstrated via an anatomically selective relationship between motor performance and Ironsmith-derived QSM values in motor cortex. In sum, Ironsmith provides a freely-available, reliable, turn-key pipeline for QSM-based data analyses to support research on the impact of brain iron in aging and neurodegenerative disease.
Project description:Technical advances in Next Generation Sequencing (NGS) provide a means to acquire deeper insights into cellular functions. The lack of standardized and automated methodologies poses a challenge for the analysis and interpretation of RNA sequencing data. We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis, data evaluation and annotation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data processing (suitable for Illumina, SOLiD and Solexa).Comparative transcriptomics analyses with TRAPLINE result in a set of differentially expressed genes, their corresponding protein-protein interactions, splice variants, promoter activity, predicted miRNA-target interactions and files for single nucleotide polymorphism (SNP) calling. The obtained results are combined into a single file for downstream analysis such as network construction. We demonstrate the value of the proposed pipeline by characterizing the transcriptome of our recently described stem cell derived antibiotic selected cardiac bodies ('aCaBs').TRAPLINE supports NGS-based research by providing a workflow that requires no bioinformatics skills, decreases the processing time of the analysis and works in the cloud. The pipeline is implemented in the biomedical research platform Galaxy and is freely accessible via www.sbi.uni-rostock.de/RNAseqTRAPLINE or the specific Galaxy manual page (https://usegalaxy.org/u/mwolfien/p/trapline---manual).
Project description:ChIA-PET (chromatin interaction analysis with paired-end tags) enables genome-wide discovery of chromatin interactions involving specific protein factors, with base pair resolution. Interpretation of ChIA-PET data requires a robust analytic pipeline. Here, we introduce ChIA-PIPE, a fully automated pipeline for ChIA-PET data processing, quality assessment, visualization, and analysis. ChIA-PIPE performs linker filtering, read mapping, peak calling, and loop calling and automates quality control assessment for each dataset. To enable visualization, ChIA-PIPE generates input files for two-dimensional contact map viewing with Juicebox and HiGlass and provides a new dockerized visualization tool for high-resolution, browser-based exploration of peaks and loops. To enable structural interpretation, ChIA-PIPE calls chromatin contact domains, resolves allele-specific peaks and loops, and annotates enhancer-promoter loops. ChIA-PIPE also supports the analysis of other related chromatin-mapping data types.