ABSTRACT: We have designed a zebrafish genomic microarray to identify DNA-protein interactions in the proximal promoter regions of over 11,000 zebrafish genes. Using these microarrays, together with chromatin immunoprecipitation with an antibody directed against tri-methylated lysine 4 of Histone H3, we demonstrate the feasibility of this method in zebrafish. This approach will allow investigators to determine the genomic binding locations of DNA interacting proteins during development and expedite the assembly of the genetic networks that regulate embryogenesis. Genomic array design Microarrays were designed as described below and manufactured by Agilent Technologies (www.agilent.com). Further information on design can be found at http://jura.wi.mit.edu/bioc/gbell/zfish_chip/. Selection of transcription start sites and identification of promoter sequences We interrogated 5 databases: Ensembl, VEGA, Refseq, ZGC full length clones and a database provided by Dr. Leonard Zon (Harvard Medical School, Boston, USA) in order to assemble an extensive list of zebrafish transcripts. The Zon lab database is a hand-curated database of zebrafish genes that have homologues in other species. We included all transcripts that appeared in the manually annotated databases (VEGA, Zon) and in the ZGC full length database. We also identified genes present in any 2 of the 5 databases and included those not already selected. The transcripts were mapped to the zebrafish genome (Zv4, June 2004) obtained from UCSC Bioinformatics (http://genome.ucsc.edu) and the transcription start site (TSS) for each transcript was determined. Transcripts with TSSs within 500bp were clustered into a transcriptional unit (TU) and promoter regions were identified relative to the most upstream TSS. This resulted in the identification of 13,413 TUs and corresponding promoter regions. Each promoter region was extracted and masked for repetitive sequence by RepeatMasker. If the promoter region contained a gap the upstream sequence was also masked. Information on the transcriptional units that were included in the final design can be found at http://jura.wi.mit.edu/bioc/gbell/zfish_chip/. Selection of oligonucleotides 60-mer oligonucleotide probes representing the region between 1.5kb upstream and 0.5kb downstream of the annotated TSS of each transcriptional unit were then designed. Although transcription factors and other DNA binding proteins are known to regulate genes from distances of greater than -1.5kb or + 0.5kb, much information can be gained from regions close to the TSS [45], and the H3K4Me3 mark studied in this paper is found at the most 5â end of a gene, close to the TSS. Selection of 60-mers for the microarrays was essentially as described in [14] using the Zv4 build of the zebrafish genome and a locally customized version of ArrayOligoSelector. 60-mers were chosen so that promoter regions contained approximately one probe every 250bp with a maximum distance between probes for each promoter region set at 600bp. In cases where only one probe could be designed for a particular TU these were not included in the final design. This process yielded 80,839 probes for 11,171 promoter regions We also incorporated several sets of control probes, both positive and negative. On each array there are 1090 probes designed against âgene desertâ regions, which are genomic regions that are unlikely to be bound by transcriptional regulators, and 270 probes designed against Arabidopsis thaliana genes, which are not present in the zebrafish genome (by BLAST). In addition, because our main motivation for making these microarrays is to identify mesodermally-regulated genes we included 7 genes expressed in mesoderm during gastrulation as positive controls (wnt11, flh, vent, msgn1, myod, fgf8, pcdh8). Probes designed against these promoters, which flank from 3-4kb around each TSS, are arrayed 2-4 times on each slide. Since these genes are expressed at gastrula stages to varying degrees, they also serve as a positive controls in this study. Finally there are 2256 controls added by Agilent and a variable number of blank spots. These probes were divided between two microarray slides each with 44,290 features. We refer to these two microarray slides as the âproximal promoter setâ. A proximal promoter set based on these designs as well as an expanded set of 9 slides which contain regions from â9kb to + 3kb relative to the TSS, are available by contacting Agilent (www.agilent.com) or by downloading the design files from http://jura.wi.mit.edu/bioc/gbell/zfish_chip/ for self-manufacture.