ABSTRACT: Amartya Sanyal mailto:amartya.sanyal@umassmed.edu (Wet Lab), Bryan R. Lajoie mailto:bryan.lajoie@umassmed.edu, Gaurav Jain mailto:gaurav.jain@umassmed.edu (Dry Lab), Job Dekker mailto:job.dekker@umassmed.edu (Principal Investigator) This track contains chromatin interaction data generated using the 5C (Chromatin Conformation Capture Carbon Copy) method by the ENCODE group (Dekker Lab) located at the University of Massachusetts, Worcester, MA. This track shows the significant looping interactions between transcriptional start sites (TSS) and distal regulatory elements in the context of the 44 ENCODE pilot regions spanning 1% of the human genome. Although the DNA is a linear sequence, the chromatin, which is packed and organized inside the nucleus, does not function linearly. This is most clearly illustrated by the fact that genes are often regulated by elements that are located hundreds of kilobases away in the linear genome. Imaging techniques have shown that regulatory elements can act over large genomic distances by engaging in direct physical interactions with target genes, resulting in the formation of chromatin loops. Based on these observations, we have envisaged that the spatial organization of the genome resembles a three-dimensional network that is driven by physical associations between genes and regulatory elements, both in cis (within the same chromosome) and in trans (between different chromosomes) (Dekker, 2006). Apart from imaging technology which is labor intensive and low-throughput, long-range chromatin looping interactions can be detected using the Chromosome Conformation Capture (3C) technology (Dekker et al., 2002). The 3C method employs formaldehyde cross-linking to covalently link interacting chromatin segments in intact cells. Cells are subsequently lysed and chromatin is digested with a restriction enzyme of choice. The digested fragments are then ligated under dilute conditions to facilitate intramolecular ligation. The result is a genome-wide interaction library of ligation products corresponding to all possible chromatin interactions. Specific ligation products can then be detected by PCR using specific primer pairs. The 5C method was developed to dramatically increase 3C throughput (Dostie et al., 2006; Dostie and Dekker, 2007). The 5C method greatly increases the scale of chromatin interaction detection by replacing the PCR detection step of 3C with ligation-mediated amplification (LMA). LMA is advantageous due to a much higher level of multiplexing by using thousands of primers in a single reaction to detect millions of chromatin interactions (ligation junctions) in parallel. The LMA step effectively "copies" 3C ligation products into much smaller 5C ligation products that precisely correspond to ligation junctions formed during the 3C procedure. The products of the multiplexed LMA reaction constitute the 5C library. The composition of the 5C library is determined using high-throughput DNA sequencing. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf The aim of the pilot study was to generate a "connectivity map" between transcription start sites (TSS) and distal regulatory elements within the 44 ENCODE PILOT regions. In the current scheme, 5C primers were designed for all HindIII restriction fragments. Reverse primers were designed on fragments containing the TSS of annotated genes. Forward primers were designed on all other fragments. This design allowed for the interrogation of all TSS with all other restriction fragments, thus generating an interaction map between TSS and regulatory elements. For gene desert ENCODE pilot regions (for example ENr313), an altered scheme of forward and reverse primers was designed. Primers were selected for relative uniqueness using a custom 15-mer frequency table and BLAST. A custom hexamer barcode was added to each primer to ensure the sequence was unique relative to the primer pool being used. Primers were also selected for the appropriate melting temperature and GC-content and a universal tail sequence for amplification. The 44 ENCODE regions were analyzed in two groups using two separate 5C primer pools. The first group (ENm) contained the manually-picked ENCODE regions, ENm001-014 and ENr313. The second group (ENr) contained the 30 randomly-picked ENCODE regions. The two 5C primer pools were made by pooling 5C primers for interrogating long-range interactions in the two groups of ENCODE regions. The primer pool for the ENm group contained a total of 3,150 primers (476 reverse 5C primers and 2674 forward 5C primers). This primer pool allowed interrogation of a total of 1,272,824 interactions. Of these, 83,427 interactions were between fragments that were both located in the same ENCODE region. This primer pool for the ENr group contained a total of 3,152 primers (505 reverse 5C primers and 2647 forward 5C primers). This primer pool allowed interrogation of a total of 1,336,735 interactions. Of these, 34,859 interactions were between fragments that were both located in the same ENCODE region. In total, 981 reverse primers and 5,321 forward primers were designed (corresponding to ~77.1% (6,302/8,174) of all HindIII fragments in the 44 ENCODE regions). Currently, data for two biological replicates have been generated for ENCODE Tier I (GM12878 and K562), Tier II (HeLa-S3), and H1 human embryonic stem cells (H1-hESC), spanning 14 ENCODE manual regions along with one random region (ENr313) as well as 30 random regions separately using high-throughput paired-end sequencing in the Illumina GA2 platform. The looping interactions, which are detected in both the biological replicates, are considered significant.