Unknown

Dataset Information

0

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome.


ABSTRACT: A major challenge in interpreting genome sequences is understanding how the genome encodes the information that specifies when and where a gene will be expressed. The first step in this process is the identification of regions of the genome that contain regulatory information. In higher eukaryotes, this cis-regulatory information is organized into modular units [cis-regulatory modules (CRMs)] of a few hundred base pairs. A common feature of these cis-regulatory modules is the presence of multiple binding sites for multiple transcription factors. Here, we evaluate the extent to which the tendency for transcription factor binding sites to be clustered can be used as the basis for the computational identification of cis-regulatory modules. By using published DNA binding specificity data for five transcription factors active in the early Drosophila embryo, we identified genomic regions containing unusually high concentrations of predicted binding sites for these factors. A significant fraction of these binding site clusters overlap known CRMs that are regulated by these factors. In addition, many of the remaining clusters are adjacent to genes expressed in a pattern characteristic of genes regulated by these factors. We tested one of the newly identified clusters, mapping upstream of the gap gene giant (gt), and show that it acts as an enhancer that recapitulates the posterior expression pattern of gt.

SUBMITTER: Berman BP 

PROVIDER: S-EPMC117378 | biostudies-literature | 2002 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome.

Berman Benjamin P BP   Nibu Yutaka Y   Pfeiffer Barret D BD   Tomancak Pavel P   Celniker Susan E SE   Levine Michael M   Rubin Gerald M GM   Eisen Michael B MB  

Proceedings of the National Academy of Sciences of the United States of America 20020101 2


A major challenge in interpreting genome sequences is understanding how the genome encodes the information that specifies when and where a gene will be expressed. The first step in this process is the identification of regions of the genome that contain regulatory information. In higher eukaryotes, this cis-regulatory information is organized into modular units [cis-regulatory modules (CRMs)] of a few hundred base pairs. A common feature of these cis-regulatory modules is the presence of multipl  ...[more]

Similar Datasets

| S-EPMC3084208 | biostudies-literature
2007-01-25 | GSE6409 | GEO
| S-EPMC2697649 | biostudies-literature
| S-EPMC2763271 | biostudies-literature
| S-EPMC3799424 | biostudies-literature
| S-EPMC2768654 | biostudies-literature
| S-EPMC2395258 | biostudies-literature
| S-EPMC1796902 | biostudies-literature
| S-EPMC4290730 | biostudies-literature
| S-EPMC521067 | biostudies-literature