Dataset Information

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters.

ABSTRACT: How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF's specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.

SUBMITTER: de Boer CG

PROVIDER: S-EPMC6954276 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters.

de Boer Carl G CG Vaishnav Eeshit Dhaval ED Sadeh Ronen R Abeyta Esteban Luis EL Friedman Nir N Regev Aviv A

Nature biotechnology 20191202 1

How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpr ...[more]

PMID: 31792407

Dataset Information

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters.

Publications

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Deciphering cis-regulatory logic with 100 million synthetic promoters
2019-01-01 | GSE104878 | GEO

Deciphering cis-regulatory logic with 100 million synthetic promoters (MNase)
2019-01-01 | GSE104903 | GEO

Deciphering cis-regulatory logic with 100 million synthetic promoters
| PRJNA414104 | ENA

Deciphering cis-regulatory logic with 100 million synthetic promoters (MNase)
| PRJNA414115 | ENA

Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters.
| S-EPMC3374032 | biostudies-literature

Deciphering the transcriptional regulatory logic of amino acid metabolism.
| S-EPMC3777760 | biostudies-literature

Identification of tumor-specific Salmonella Typhimurium promoters and their regulatory logic.
| S-EPMC3326293 | biostudies-literature

Deciphering the regulatory logic of an ancient, ultraconserved nuclear receptor enhancer module.
| S-EPMC4447637 | biostudies-other

Gene regulatory logic of dopamine neuron differentiation.
| S-EPMC2671564 | biostudies-literature

Deciphering the regulatory genome of Escherichia coli, one hundred promoters at a time.
| S-EPMC7567609 | biostudies-literature