Project description:DNA determines where and when genes are expressed, but the full set of sequence determinants that control gene expression is not known. Here, we measured transcriptional activity of DNA sequences that represent ~100 times larger sequence space than the human genome using massively parallel reporter assays. Machine learning models revealed that transcription factors (TFs) act generally in an additive manner with weak grammar, and that enhancers increase expression from a promoter by a mechanism that does not involve specific TF-TF interactions. The enhancers themselves can be classified into three distinct types: classical, closed chromatin and chromatin-dependent enhancers. We also show that few TFs are strongly active in a cell, with most activities similar between cell types. Individual TFs can have multiple gene regulatory activities, including chromatin opening, enhancing, promoting and TSS determining activity – consistent with the view that the TF binding motif is the only atomic unit of gene expression.
Project description:DNA determines where and when genes are expressed, but the full set of sequence determinants that control gene expression is not known. Here, we measured transcriptional activity of DNA sequences that represent ~100 times larger sequence space than the human genome using massively parallel reporter assays. Machine learning models revealed that transcription factors (TFs) act generally in an additive manner with weak grammar, and that enhancers increase expression from a promoter by a mechanism that does not involve specific TF-TF interactions. The enhancers themselves can be classified into three distinct types: classical, closed chromatin and chromatin-dependent enhancers. We also show that few TFs are strongly active in a cell, with most activities similar between cell types. Individual TFs can have multiple gene regulatory activities, including chromatin opening, enhancing, promoting and TSS determining activity – consistent with the view that the TF binding motif is the only atomic unit of gene expression.