Project description:Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied ?-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making them highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the ?-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.
Project description:Rhodobacter sphaeroides is the best studied photosynthetic bacterium, yet much remains unknown about its transcriptional regulatory processes on a genome-scale. We developed a work-flow for genome-scale reconstruction of transcriptional regulatory networks and applied it to sequence and gene expression data sets available for R. sphaeroides. To assess the predictive performance of our reconstructed model, we generated global transcript level and/or protein-DNA interaction data for 3 transcription factors (PpsR, RSP_0489 and RSP_3341). This dataset contains global transcript level analyses for RSP_0489 and RSP_3341 deletion strains, as well as matching wild type controls. Microarray analysis conducted for deletion strains of 2 previously uncharacterized transcription factors predicted to be involved in the regulation of carbon metabolism and iron homeostasis in R. sphaeroides using the R. sphaeroides Affymetrix gene chip. These deletion mutant expression profiles were compared to that of wild type cells to determine differentially expressed genes regulated by these transcription factors.
Project description:Rhodobacter sphaeroides is the best studied photosynthetic bacterium, yet much remains unknown about its transcriptional regulatory processes on a genome-scale. We developed a work-flow for genome-scale reconstruction of transcriptional regulatory networks and applied it to sequence and gene expression data sets available for R. sphaeroides. To assess the predictive performance of our reconstructed model, we generated global transcript level and/or protein-DNA interaction data for 3 transcription factors (PpsR, RSP_0489 and RSP_3341). This dataset contains global transcript level analyses for RSP_0489 and RSP_3341 deletion strains, as well as matching wild type controls.
Project description:By integrating sequence information from closely related bacteria with a compendium of high-throughput gene expression datasets, a large-scale transcriptional regulatory networks was constructed for Rhodobacter sphaeroides. Predictions from this network were validated in part using genome-wide analysis for 3 transcription factors (PpsR, RSP_0489 and RSP_3341). Genome-wide protein-DNA interaction analysis of 3 transcription factors predicted to be involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341) were used to validate predictions from a large-scale reconstruction of R. sphaeroides transcriptional regulatory network.
Project description:By integrating sequence information from closely related bacteria with a compendium of high-throughput gene expression datasets, a large-scale transcriptional regulatory networks was constructed for Rhodobacter sphaeroides. Predictions from this network were validated in part using genome-wide analysis for 3 transcription factors (PpsR, RSP_0489 and RSP_3341).
Project description:BACKGROUND: Construction of transcriptional regulatory networks (TRNs) is of priority concern in systems biology. Numerous high-throughput approaches, including microarray and next-generation sequencing, are extensively adopted to examine transcriptional expression patterns on the whole-genome scale; those data are helpful in reconstructing TRNs. Identifying transcription factor binding sites (TFBSs) in a gene promoter is the initial step in elucidating the transcriptional regulation mechanism. Since transcription factors usually co-regulate a common group of genes by forming regulatory modules with similar TFBSs. Therefore, the combinatorial interactions of transcription factors must be modeled to reconstruct the gene regulatory networks. Description For systems biology applications, this work develops a novel database called Arabidopsis thaliana Promoter Analysis Net (AtPAN), capable of detecting TFBSs and their corresponding transcription factors (TFs) in a promoter or a set of promoters in Arabidopsis. For further analysis, according to the microarray expression data and literature, the co-expressed TFs and their target genes can be retrieved from AtPAN. Additionally, proteins interacting with the co-expressed TFs are also incorporated to reconstruct co-expressed TRNs. Moreover, combinatorial TFs can be detected by the frequency of TFBSs co-occurrence in a group of gene promoters. In addition, TFBSs in the conserved regions between the two input sequences or homologous genes in Arabidopsis and rice are also provided in AtPAN. The output results also suggest conducting wet experiments in the future. CONCLUSIONS: The AtPAN, which has a user-friendly input/output interface and provide graphical view of the TRNs. This novel and creative resource is freely available online at http://AtPAN.itps.ncku.edu.tw/.
Project description:Transcription regulation is a fundamental biological process, and extensive efforts have been made to dissect its mechanisms through direct biological experiments and regulation modeling based on physical-chemical principles and mathematical formulations. Despite these efforts, transcription regulation is yet not well understood because of its complexity and limitations in biological experiments. Recent advances in high throughput technologies have provided substantial amounts and diverse types of genomic data that reveal valuable information on transcription regulation, including DNA sequence data, protein-DNA binding data, microarray gene expression data, and others. In this article, we propose a Bayesian error analysis model to integrate protein-DNA binding data and gene expression data to reconstruct transcriptional regulatory networks. There are two unique aspects to this proposed model. First, transcription is modeled as a set of biochemical reactions, and a linear system model with clear biological interpretation is developed. Second, measurement errors in both protein-DNA binding data and gene expression data are explicitly considered in a Bayesian hierarchical model framework. Model parameters are inferred through Markov chain Monte Carlo. The usefulness of this approach is demonstrated through its application to infer transcriptional regulatory networks in the yeast cell cycle.