Project description:In eukaryotes, many transcription factors (TF) usually form dimers or oligomers with itself or other TFs to regulate gene expression. To study how homo- and heterodimerization of TFs affect DNA binding specificity, we developed a double DNA Affinity Purification sequencing (double DAP-seq; dDAP-seq) technique that maps heterodimer binding sites in endogenous genome context and applied it to elucidate the binding profiles of homo- and heterodimers of the Group C and S1 basic leucine zipper (bZIP) transcription factors in Arabidopsis. Genome-wide binding profiles of twenty pairs of bZIP C/S1 heterodimers and bZIP S1 homodimers revealed that heterodimerization significantly expands the DNA binding preferences of homodimers, creating unique binding sites and target gene functions. In addition to the classical ACGT elements recognized by plant bZIPs, we found the Group C/S1 dimers bind to sequence motifs that might share an origin with the yeast bZIP GCN4. Further analysis of heterodimer-specific binding sequences uncovered two types of motif recognition patterns that mediate heterodimer specificity. Our study shed light on the functions and mechanisms of TF dimerization and demonstrated the potential of dDAP-seq in deciphering the complexity of these interactions within or across TF families.
Project description:Transposase-Accessible Chromatin followed by sequencing (ATAC-seq) is a simple protocol for detection of open chromatin. Computational footprinting, the search for regions with depletion of cleavage events due to transcription factor binding, is poorly understood for ATAC-seq. We propose the first footprinting method considering ATAC-seq protocol artifacts. HINT-ATAC uses a position dependency model to learn the cleavage preferences of the transposase. We observe strand-specific cleavage patterns around transcription factor binding sites, which are determined by local nucleosome architecture. By incorporating all these biases, HINT-ATAC is able to significantly outperform competing methods in the prediction of transcription factor binding sites with footprints.
Project description:The fungal pathogen Candida glabrata is an emerging cause of candidiasis in part owing to its robust ability to acquire tolerance to the major clinical antifungal drug fluconazole. Similar to the related species Candida albicans, C. glabrata most typically gains azole tolerance via transcriptional induction of a suite of resistance genes, including a locus encoding an ABCG-type ATP-binding cassette (ABC) transporter that is referred to as CDR1 in Candida species. In C. glabrata, CDR1 expression is controlled primarily by the activity of a transcriptional activator protein called Pdr1. Strains exhibiting reduced azole susceptibility often contain substitution mutations in PDR1 that in turn lead to elevated mRNA levels of target genes with associated azole resistance. Pdr1 activity is also induced upon loss of the mitochondrial genome status and upon challenge by azole drugs. While extensive analyses of the transcriptional effects of Pdr1 have identified a number of genes that are regulated by this factor, we cannot yet separate direct from indirect target genes. Here we used chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-seq) to identify the promoters and associated genes directly regulated by Pdr1. These genes include many that are shared with the yeast Saccharomyces cerevisiae but others that are unique to C. glabrata, including the ABC transporter-encoding locus YBT1, genes involved in DNA repair, and several others. These data provide the outline for understanding the primary response genes involved in production of Pdr1-dependent azole resistance in C. glabrata.
Project description:Despite almost 40 years of molecular genetics research in Escherichia coli a major fraction of its Transcription Start Sites (TSSs) are still unknown, limiting therefore our understanding of the regulatory circuits that control gene expression in this model organism. RegulonDB (http://regulondb.ccg.unam.mx/) is aimed at integrating the genetic regulatory network of E. coli K12 as an entirely bioinformatic project up till now. In this work, we extended its aims by generating experimental data at a genome scale on TSSs, promoters and regulatory regions. We implemented a modified 5' RACE protocol and an unbiased High Throughput Pyrosequencing Strategy (HTPS) that allowed us to map more than 1700 TSSs with high precision. From this collection, about 230 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1500 TSSs mapped belong to about 1000 different genes, many of them with no assigned function. We identified promoter sequences and type of sigma factors that control the expression of about 80% of these genes. As expected, the housekeeping sigma(70) was the most common type of promoter, followed by sigma(38). The majority of the putative TSSs were located between 20 to 40 nucleotides from the translational start site. Putative regulatory binding sites for transcription factors were detected upstream of many TSSs. For a few transcripts, riboswitches and small RNAs were found. Several genes also had additional TSSs within the coding region. Unexpectedly, the HTPS experiments revealed extensive antisense transcription, probably for regulatory functions. The new information in RegulonDB, now with more than 2400 experimentally determined TSSs, strengthens the accuracy of promoter prediction, operon structure, and regulatory networks and provides valuable new information that will facilitate the understanding from a global perspective the complex and intricate regulatory network that operates in E. coli.
Project description:Identification of transcription factor targets is critical to understanding gene regulatory networks. Here, we uncover transcription factor binding sites and target genes employing systematic evolution of ligands by exponential enrichment (SELEX). Instead of selecting randomly synthesized DNA oligonucleotides as in most SELEX studies, we utilized zebrafish genomic DNA to isolate fragments bound by Fezf2, an evolutionarily conserved gene critical for vertebrate forebrain development. This is, to our knowledge, the first time that SELEX is applied to a vertebrate genome. Computational analysis of bound genomic fragments predicted a core consensus binding site, which identified response elements that mediated Fezf2-dependent transcription both in vitro and in vivo. Fezf2-bound fragments were enriched for conserved sequences. Surprisingly, ?20% of these fragments overlapped well annotated protein-coding exons. Through loss of function, gain of function, and chromatin immunoprecipitation, we further identified and validated eomesa/tbr2 and lhx2b as biologically relevant target genes of Fezf2. Mutations in eomesa/tbr2 cause microcephaly in humans, whereas lhx2b is a critical regulator of cell fate and axonal targeting in the developing forebrain. These results demonstrate the feasibility of employing genomic SELEX to identify vertebrate transcription factor binding sites and target genes and reveal Fezf2 as a transcription activator and a candidate for evaluation in human microcephaly.
Project description:BACKGROUND:When transcription factor binding sites are known for a particular transcription factor, it is possible to construct a motif model that can be used to scan sequences for additional sites. However, few statistically significant sites are revealed when a transcription factor binding site motif model is used to scan a genome-scale database. METHODS:We have developed a scanning algorithm, PhyloScan, which combines evidence from matching sites found in orthologous data from several related species with evidence from multiple sites within an intergenic region, to better detect regulons. The orthologous sequence data may be multiply aligned, unaligned, or a combination of aligned and unaligned. In aligned data, PhyloScan statistically accounts for the phylogenetic dependence of the species contributing data to the alignment and, in unaligned data, the evidence for sites is combined assuming phylogenetic independence of the species. The statistical significance of the gene predictions is calculated directly, without employing training sets. RESULTS:In a test of our methodology on synthetic data modeled on seven Enterobacteriales, four Vibrionales, and three Pasteurellales species, PhyloScan produces better sensitivity and specificity than MONKEY, an advanced scanning approach that also searches a genome for transcription factor binding sites using phylogenetic information. The application of the algorithm to real sequence data from seven Enterobacteriales species identifies novel Crp and PurR transcription factor binding sites, thus providing several new potential sites for these transcription factors. These sites enable targeted experimental validation and thus further delineation of the Crp and PurR regulons in E. coli. CONCLUSION:Better sensitivity and specificity can be achieved through a combination of (1) using mixed alignable and non-alignable sequence data and (2) combining evidence from multiple sites within an intergenic region.
Project description:A resource that provides candidate transcription factor binding sites (TFBSs) does not currently exist for cattle. Such data is necessary, as predicted sites may serve as excellent starting locations for future omics studies to develop transcriptional regulation hypotheses. In order to generate this resource, we employed a phylogenetic footprinting approach-using sequence conservation across cattle, human and dog-and position-specific scoring matrices to identify 379,333 putative TFBSs upstream of nearly 8000 Mammalian Gene Collection (MGC) annotated genes within the cattle genome. Comparisons of our predictions to known binding site loci within the PCK1, ACTA1 and G6PC promoter regions revealed 75% sensitivity for our method of discovery. Additionally, we intersected our predictions with known cattle SNP variants in dbSNP and on the Illumina BovineHD 770k and Bos 1 SNP chips, finding 7534, 444 and 346 overlaps, respectively. Due to our stringent filtering criteria, these results represent high quality predictions of putative TFBSs within the cattle genome. All binding site predictions are freely available at http://bfgl.anri.barc.usda.gov/BovineTFBS/ or http://199.133.54.77/BovineTFBS.
Project description:Although the lineage-determining ability of transcription factors is often modulated according to cellular context, the mechanisms by which such switching occurs are not well known. Using a transcriptional programming model, we found that Atoh1 is repurposed from a neuronal to an inner ear hair cell (HC) determinant by the combined activities of Gfi1 and Pou4f3. In this process, Atoh1 maintains its regulation of neuronal genes but gains ability to regulate HC genes. Pou4f3 enables Atoh1 access to genomic locations controlling the expression of sensory (including HC) genes, but Atoh1 + Pou4f3 are not sufficient for HC differentiation. Gfi1 is key to the Atoh1-induced lineage switch, but surprisingly does not alter Atoh1's binding profile. Gfi1 acts in two divergent ways. It represses the induction by Atoh1 of genes that antagonise HC differentiation, a function in keeping with its well-known repressor role in haematopoiesis. Remarkably, we find that Gfi1 also acts as a co-activator: it binds directly to Atoh1 at existing target genes to enhance its activity. These findings highlight the diversity of mechanisms by which one TF can redirect the activity of another to enable combinatorial control of cell identity.