Project description:FNR is a well-studied global regulator of anaerobiosis, which is widely conserved across bacteria. Despite the importance of FNR and anaerobiosis in microbial lifestyles, the factors that influence its function on a genome-wide scale are poorly understood. Here, we report a functional genomic analysis of FNR action. We find that FNR occupancy at many target sites is strongly influenced by nucleoid-associated proteins (NAPs) that restrict access to many FNR binding sites. At a genome-wide level, only a subset of predicted FNR binding sites were bound under anaerobic fermentative conditions and many appeared to be masked by the NAPs H-NS, IHF and Fis. Similar assays in cells lacking H-NS and its paralog StpA showed increased FNR occupancy at sites bound by H-NS in WT strains, indicating that large regions of the genome are not readily accessible for FNR binding. Genome accessibility may also explain our finding that genome-wide FNR occupancy did not correlate with the match to consensus at binding sites, suggesting that significant variation in ChIP signal was attributable to cross-linking or immunoprecipitation efficiency rather than differences in binding affinities for FNR sites. Correlation of FNR ChIP-seq peaks with transcriptomic data showed that less than half of the FNR-regulated operons could be attributed to direct FNR binding. Conversely, FNR bound some promoters without regulating expression presumably requiring changes in activity of condition-specific transcription factors. Such combinatorial regulation may allow Escherichia coli to respond rapidly to environmental changes and confer an ecological advantage in the anaerobic but nutrient-fluctuating environment of the mammalian gut.
Project description:Using chromatin immunoprecipitation (ChIP) and high-density microarrays, we have measured the distribution of the global transcription regulator protein, FNR, across the entire Escherichia coli chromosome in exponentially growing cells. Sixty-three binding targets, each located at the 5' end of a gene, were identified. Some targets are adjacent to poorly transcribed genes where FNR has little impact on transcription. In stationary phase, the distribution of FNR was largely unchanged. Control experiments showed that, like FNR, the distribution of the nucleoid-associated protein, IHF, is little altered when cells enter stationary phase, whilst RNA polymerase undergoes a complete redistribution.
Project description:Uncovering the mechanisms that affect the binding specificity of transcription factors (TFs) is critical for understanding the principles of gene regulation. Although sequence-based models have been used successfully to predict TF binding specificities, we found that including DNA shape information in these models improved their accuracy and interpretability. Previously, we developed a method for modeling DNA binding specificities based on DNA shape features extracted from Monte Carlo (MC) simulations. Prediction accuracies of our models, however, have not yet been compared to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) data or Molecular Dynamics (MD) simulations. Here, we integrated DNA shape information extracted from MC or MD simulations and XRC data into predictive models of TF binding and compared their performance. Models that incorporated structural information consistently showed improved performance over sequence-based models regardless of data source. Furthermore, we derived and validated nine additional DNA shape features beyond our original set of four features. The expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width, is available in our R/Bioconductor package DNAshapeR and enables a comprehensive structural description of the double helix on a genome-wide scale.
Project description:Zymomonas mobilis is an ethanologenic alphaproteobacterium with promise for the industrial conversion of renewable plant biomass into fuels and chemical bioproducts. Limited functional annotation of the Z. mobilis genome is a current barrier to both fundamental studies of Z. mobilis and its development as a synthetic biology chassis. To gain insight, we collected sample-matched multiomics data, including RNA sequencing (RNA-seq), transcription start site (TSS) sequencing (TSS-seq), termination sequencing (term-seq), ribosome profiling, and label-free shotgun proteomic mass spectrometry, across different growth conditions and used these data to improve annotation and assign functional sites in the Z. mobilis genome. Proteomics and ribosome profiling informed revisions of protein-coding genes, which included 44 start codon changes and 42 added proteins. We developed statistical methods for annotating transcript 5' and 3' ends, enabling the identification of 3,940 TSSs and their corresponding promoters and 2,091 transcription termination sites, which were distinguished from RNA processing sites by the lack of an adjacent RNA 5' end. Our results revealed that Z. mobilis σA -35 and -10 promoter elements closely resemble canonical Escherichia coli -35 and -10 elements, with one notable exception: the Z. mobilis -10 element lacks the highly conserved -7 thymine observed in E. coli and other previously characterized σA promoters. The σA promoters of another alphaproteobacterium, Caulobacter crescentus, similarly lack the conservation of -7 thymine in their -10 elements. Our results anchor the development of Z. mobilis as a platform for synthetic biology and establish strategies for empirical genome annotation that can complement purely computational methods.IMPORTANCE Efforts to rationally engineer synthetic pathways in Zymomonas mobilis are impeded by a lack of knowledge and tools for predictable and quantitative programming of gene regulation at the transcriptional, posttranscriptional, and posttranslational levels. With the detailed functional characterization of the Z. mobilis genome presented in this work, we provide crucial knowledge for the development of synthetic genetic parts tailored to Z. mobilis This information is vital as researchers continue to develop Z. mobilis for synthetic biology applications. Our methods and statistical analyses also provide ways to rapidly advance the understanding of poorly characterized bacteria via empirical data that enable the experimental validation of sequence-based prediction for genome characterization and annotation.
Project description:Limited functional annotation of the Z. mobilis genome is a current barrier to both basic studies of Z. mobilis and its development as a synthetic-biology chassis. To gain insight, we collected sample-matched multiomics data including RNA-seq, transcription start site sequencing (TSS-seq), termination sequencing (term-seq), ribosome profiling, and label-free shotgun proteomic mass spectrometry across different growth conditions to improve annotation and assign functional sites in the Z. mobilis genome. Proteomics and ribosome profiling informed revisions of protein-coding genes, which included 44 start codon changes and 42 added proteins.
Project description:Bacterial RNA polymerases must associate with a σ factor to bind promoter DNA and initiate transcription. There are two families of σ factor: the σ70 family and the σ54 family. Members of the σ54 family are distinct in their ability to bind promoter DNA sequences, in the context of RNA polymerase holoenzyme, in a transcriptionally inactive state. Here, we map the genome-wide association of Escherichia coli σ54, the archetypal member of the σ54 family. Thus, we vastly expand the list of known σ54 binding sites to 135. Moreover, we estimate that there are more than 250 σ54 sites in total. Strikingly, the majority of σ54 binding sites are located inside genes. The location and orientation of intragenic σ54 binding sites is non-random, and many intragenic σ54 binding sites are conserved. We conclude that many intragenic σ54 binding sites are likely to be functional. Consistent with this assertion, we identify three conserved, intragenic σ54 promoters that drive transcription of mRNAs with unusually long 5' UTRs.
Project description:Despite almost 40 years of molecular genetics research in Escherichia coli a major fraction of its Transcription Start Sites (TSSs) are still unknown, limiting therefore our understanding of the regulatory circuits that control gene expression in this model organism. RegulonDB (http://regulondb.ccg.unam.mx/) is aimed at integrating the genetic regulatory network of E. coli K12 as an entirely bioinformatic project up till now. In this work, we extended its aims by generating experimental data at a genome scale on TSSs, promoters and regulatory regions. We implemented a modified 5' RACE protocol and an unbiased High Throughput Pyrosequencing Strategy (HTPS) that allowed us to map more than 1700 TSSs with high precision. From this collection, about 230 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1500 TSSs mapped belong to about 1000 different genes, many of them with no assigned function. We identified promoter sequences and type of sigma factors that control the expression of about 80% of these genes. As expected, the housekeeping sigma(70) was the most common type of promoter, followed by sigma(38). The majority of the putative TSSs were located between 20 to 40 nucleotides from the translational start site. Putative regulatory binding sites for transcription factors were detected upstream of many TSSs. For a few transcripts, riboswitches and small RNAs were found. Several genes also had additional TSSs within the coding region. Unexpectedly, the HTPS experiments revealed extensive antisense transcription, probably for regulatory functions. The new information in RegulonDB, now with more than 2400 experimentally determined TSSs, strengthens the accuracy of promoter prediction, operon structure, and regulatory networks and provides valuable new information that will facilitate the understanding from a global perspective the complex and intricate regulatory network that operates in E. coli.
Project description:Regulation of gene expression by sequence-specific transcription factors is central to developmental programs and depends on the binding of transcription factors with target sites in the genome. To date, most such analyses in Caenorhabditis elegans have focused on the interactions between a single transcription factor with one or a few select target genes. As part of the modENCODE Consortium, we have used chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) to determine the genome-wide binding sites of 22 transcription factors (ALR-1, BLMP-1, CEH-14, CEH-30, EGL-27, EGL-5, ELT-3, EOR-1, GEI-11, HLH-1, LIN-11, LIN-13, LIN-15B, LIN-39, MAB-5, MDL-1, MEP-1, PES-1, PHA-4, PQM-1, SKN-1, and UNC-130) at diverse developmental stages. For each factor we determined candidate gene targets, both coding and non-coding. The typical binding sites of almost all factors are within a few hundred nucleotides of the transcript start site. Most factors target a mixture of coding and non-coding target genes, although one factor preferentially binds to non-coding RNA genes. We built a regulatory network among the 22 factors to determine their functional relationships to each other and found that some factors appear to act preferentially as regulators and others as target genes. Examination of the binding targets of three related HOX factors--LIN-39, MAB-5, and EGL-5--indicates that these factors regulate genes involved in cellular migration, neuronal function, and vulval differentiation, consistent with their known roles in these developmental processes. Ultimately, the comprehensive mapping of transcription factor binding sites will identify features of transcriptional networks that regulate C. elegans developmental processes.