Project description:Histone modifications are a key epigenetic mechanism to activate or repress the expression of genes. Data sets of matched microarray expression data and histone modification data measured by ChIP-seq exist, but methods for integrative analysis of both data types are still rare. Here, we present a novel bioinformatic approach to detect genes that are differentially expressed between two conditions putatively caused by alterations in histone modification. We introduce a correlation measure for integrative analysis of ChIP-seq and gene expression data and demonstrate that a proper normalization of the ChIP-seq data is crucial. We suggest applying Bayesian mixture models of different distributions to further study the distribution of the correlation measure. The implicit classification of the mixture models is used to detect genes with differences between two conditions in both gene expression and histone modification. The method is applied to different data sets and its superiority to a naive separate analysis of both data types is demonstrated. This GEO series contains the expression data of the Cebpa example data set.
Project description:Histone modifications are a key epigenetic mechanism to activate or repress the expression of genes. Data sets of matched microarray expression data and histone modification data measured by ChIP-seq exist, but methods for integrative analysis of both data types are still rare. Here, we present a novel bioinformatic approach to detect genes that are differentially expressed between two conditions putatively caused by alterations in histone modification. We introduce a correlation measure for integrative analysis of ChIP-seq and gene expression data and demonstrate that a proper normalization of the ChIP-seq data is crucial. We suggest applying Bayesian mixture models of different distributions to further study the distribution of the correlation measure. The implicit classification of the mixture models is used to detect genes with differences between two conditions in both gene expression and histone modification. The method is applied to different data sets and its superiority to a naive separate analysis of both data types is demonstrated. This GEO series contains the expression data of the Cebpa example data set. This data set was derived from sorted Cebpafl/fl and Cebpafl/fl;Mx1Cre murine hematopoietic LSKCD150- 18 post pIpC injections (conditional deletion of Cebpa). The specimens from three Cebpafl/fl and three Cebpafl/fl;Mx1Cre mice were hybridized separately on six Affymetrix Mouse Gene 1.0 ST arrays. Associated histone modification ChIP-seq data is provided by series GSE43007.
Project description:As systems biology approaches to virology have become more tractable, it has become possible to analyze highly studied viruses such as HIV in new, unbiased ways, including spatial proteomics. We have employed here a differential centrifugation protocol to fractionate an inducible model of HIV-expression in Jurkat T cells for proteomic analysis by mass spectrometry. Using these proteomics data, we evaluated the merits of several reported machine learning pipelines for classification of the spatial proteome and identification of protein translocations. From these analyses we found that classifier performance was organelle-dependent, with Bayesian t-augmented Gaussian mixture modeling outperforming support vector machine (SVM) learning for mitochondrial and ER proteins, but underperforming on cytosolic, nuclear, and plasma membrane proteins by QSep analysis. We also observed a generally higher performance for protein translocation identification using a Bayesian model, BANDLE, on SVM-classified data. Comparative BANDLE analysis of WT and ΔNef models also identified known Nef-dependent interactors such as TCR signaling and coatomer complex. Lastly, we found that SVM classification showed higher consistency and was less sensitive to HIV-dependent noise in our data. These findings illustrate important considerations for future studies of the spatial proteome following viral infection or expression where their generalizability can be further assessed.
Project description:Analysis of primary esophageal squamous cell carcinoma (ESCC) from 71 patients in japan. Integrative analysis of gene expression profiles and genomic alterations obtained from array-CGH and NGS provided us new insight into the pathogenesis of ESCC Gene expression levels obtained from 71 microdissected ESCC tumors. We used the commercially available Human Whole Genome Oligo DNA Microarray Kit (Agilent Technologies). Labeled cRNAs were fragmented and hybridized to an oligonucleotide microarray (Whole Human Genome 4×44K Agilent G4112F). Fluorescence intensities were determined with an Agilent DNA Microarray Scanner. The gene expression profiles (GE) obtained from microarray data were quintile normalized. The batch effect in microarray experiments was also adjusted by an empirical Bayesian approach
Project description:We developed a computational framework that integrates chromosomal copy number and gene expression data for detecting aberrations that promote cancer progression. We demonstrate the utility of this framework using a melanoma dataset. Our analysis correctly identified known drivers of melanoma and predicted multiple novel tumor dependencies. Two dependencies, TBC1D16 and RAB27A, confirmed empirically, suggest that abnormal regulation of protein trafficking contributes to proliferation in melanoma. Together, these results demonstrate the ability of integrative Bayesian approaches to identify novel candidate drivers with biological, and possibly therapeutic, importance in cancer.
Project description:ncreasing effects of anthropogenic stressors and those of natural origin on aquatic ecosystems have intensified the need for predictive and functional models of their effects. Here, we use gene expression patterns in combination with weighted gene co-expression networks and generalized additive models to predict effects on reproduction in the aquatic microcrustacean Daphnia. We developed models to predict effects on reproduction upon exposure to different cyanobacteria, different insecticides and binary mixtures of cyanobacteria and insecticides. Models developed specifically for groups of stressors (e.g. either cyanobacteria or insecticides) performed better than general models developed on all data. Furthermore, models developed using in silico generated mixture gene expression profiles from single stressor data were able to better predict effects on reproduction compared to models derived from the mixture exposures themselves. Our results highlight the potential of gene expression data to quantify effects of complex exposures at higher level organismal effects without prior mechanistic knowledge or complex exposure data.