Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

Chromatin Interactions by 5C from ENCODE/Dekker Univ. Mass.

ABSTRACT: Amartya Sanyal mailto:amartya.sanyal@umassmed.edu (Wet Lab), Bryan R. Lajoie mailto:bryan.lajoie@umassmed.edu, Gaurav Jain mailto:gaurav.jain@umassmed.edu (Dry Lab), Job Dekker mailto:job.dekker@umassmed.edu (Principal Investigator) This track contains chromatin interaction data generated using the 5C (Chromatin Conformation Capture Carbon Copy) method by the ENCODE group (Dekker Lab) located at the University of Massachusetts, Worcester, MA. This track shows the significant looping interactions between transcriptional start sites (TSS) and distal regulatory elements in the context of the 44 ENCODE pilot regions spanning 1% of the human genome. Although the DNA is a linear sequence, the chromatin, which is packed and organized inside the nucleus, does not function linearly. This is most clearly illustrated by the fact that genes are often regulated by elements that are located hundreds of kilobases away in the linear genome. Imaging techniques have shown that regulatory elements can act over large genomic distances by engaging in direct physical interactions with target genes, resulting in the formation of chromatin loops. Based on these observations, we have envisaged that the spatial organization of the genome resembles a three-dimensional network that is driven by physical associations between genes and regulatory elements, both in cis (within the same chromosome) and in trans (between different chromosomes) (Dekker, 2006). Apart from imaging technology which is labor intensive and low-throughput, long-range chromatin looping interactions can be detected using the Chromosome Conformation Capture (3C) technology (Dekker et al., 2002). The 3C method employs formaldehyde cross-linking to covalently link interacting chromatin segments in intact cells. Cells are subsequently lysed and chromatin is digested with a restriction enzyme of choice. The digested fragments are then ligated under dilute conditions to facilitate intramolecular ligation. The result is a genome-wide interaction library of ligation products corresponding to all possible chromatin interactions. Specific ligation products can then be detected by PCR using specific primer pairs. The 5C method was developed to dramatically increase 3C throughput (Dostie et al., 2006; Dostie and Dekker, 2007). The 5C method greatly increases the scale of chromatin interaction detection by replacing the PCR detection step of 3C with ligation-mediated amplification (LMA). LMA is advantageous due to a much higher level of multiplexing by using thousands of primers in a single reaction to detect millions of chromatin interactions (ligation junctions) in parallel. The LMA step effectively "copies" 3C ligation products into much smaller 5C ligation products that precisely correspond to ligation junctions formed during the 3C procedure. The products of the multiplexed LMA reaction constitute the 5C library. The composition of the 5C library is determined using high-throughput DNA sequencing. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf The aim of the pilot study was to generate a "connectivity map" between transcription start sites (TSS) and distal regulatory elements within the 44 ENCODE PILOT regions. In the current scheme, 5C primers were designed for all HindIII restriction fragments. Reverse primers were designed on fragments containing the TSS of annotated genes. Forward primers were designed on all other fragments. This design allowed for the interrogation of all TSS with all other restriction fragments, thus generating an interaction map between TSS and regulatory elements. For gene desert ENCODE pilot regions (for example ENr313), an altered scheme of forward and reverse primers was designed. Primers were selected for relative uniqueness using a custom 15-mer frequency table and BLAST. A custom hexamer barcode was added to each primer to ensure the sequence was unique relative to the primer pool being used. Primers were also selected for the appropriate melting temperature and GC-content and a universal tail sequence for amplification. The 44 ENCODE regions were analyzed in two groups using two separate 5C primer pools. The first group (ENm) contained the manually-picked ENCODE regions, ENm001-014 and ENr313. The second group (ENr) contained the 30 randomly-picked ENCODE regions. The two 5C primer pools were made by pooling 5C primers for interrogating long-range interactions in the two groups of ENCODE regions. The primer pool for the ENm group contained a total of 3,150 primers (476 reverse 5C primers and 2674 forward 5C primers). This primer pool allowed interrogation of a total of 1,272,824 interactions. Of these, 83,427 interactions were between fragments that were both located in the same ENCODE region. This primer pool for the ENr group contained a total of 3,152 primers (505 reverse 5C primers and 2647 forward 5C primers). This primer pool allowed interrogation of a total of 1,336,735 interactions. Of these, 34,859 interactions were between fragments that were both located in the same ENCODE region. In total, 981 reverse primers and 5,321 forward primers were designed (corresponding to ~77.1% (6,302/8,174) of all HindIII fragments in the 44 ENCODE regions). Currently, data for two biological replicates have been generated for ENCODE Tier I (GM12878 and K562), Tier II (HeLa-S3), and H1 human embryonic stem cells (H1-hESC), spanning 14 ENCODE manual regions along with one random region (ENr313) as well as 30 random regions separately using high-throughput paired-end sequencing in the Illumina GA2 platform. The looping interactions, which are detected in both the biological replicates, are considered significant.

ORGANISM(S): Homo sapiens

SUBMITTER: UCSC ENCODE DCC

PROVIDER: E-GEOD-39510 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Similar Datasets

Project description:Amartya Sanyal mailto:amartya.sanyal@umassmed.edu (Wet Lab), Bryan R. Lajoie mailto:bryan.lajoie@umassmed.edu, Gaurav Jain mailto:gaurav.jain@umassmed.edu (Dry Lab), Job Dekker mailto:job.dekker@umassmed.edu (Principal Investigator) This track contains chromatin interaction data generated using the 5C (Chromatin Conformation Capture Carbon Copy) method by the ENCODE group (Dekker Lab) located at the University of Massachusetts, Worcester, MA. This track shows the significant looping interactions between transcriptional start sites (TSS) and distal regulatory elements in the context of the 44 ENCODE pilot regions spanning 1% of the human genome. Although the DNA is a linear sequence, the chromatin, which is packed and organized inside the nucleus, does not function linearly. This is most clearly illustrated by the fact that genes are often regulated by elements that are located hundreds of kilobases away in the linear genome. Imaging techniques have shown that regulatory elements can act over large genomic distances by engaging in direct physical interactions with target genes, resulting in the formation of chromatin loops. Based on these observations, we have envisaged that the spatial organization of the genome resembles a three-dimensional network that is driven by physical associations between genes and regulatory elements, both in cis (within the same chromosome) and in trans (between different chromosomes) (Dekker, 2006). Apart from imaging technology which is labor intensive and low-throughput, long-range chromatin looping interactions can be detected using the Chromosome Conformation Capture (3C) technology (Dekker et al., 2002). The 3C method employs formaldehyde cross-linking to covalently link interacting chromatin segments in intact cells. Cells are subsequently lysed and chromatin is digested with a restriction enzyme of choice. The digested fragments are then ligated under dilute conditions to facilitate intramolecular ligation. The result is a genome-wide interaction library of ligation products corresponding to all possible chromatin interactions. Specific ligation products can then be detected by PCR using specific primer pairs. The 5C method was developed to dramatically increase 3C throughput (Dostie et al., 2006; Dostie and Dekker, 2007). The 5C method greatly increases the scale of chromatin interaction detection by replacing the PCR detection step of 3C with ligation-mediated amplification (LMA). LMA is advantageous due to a much higher level of multiplexing by using thousands of primers in a single reaction to detect millions of chromatin interactions (ligation junctions) in parallel. The LMA step effectively "copies" 3C ligation products into much smaller 5C ligation products that precisely correspond to ligation junctions formed during the 3C procedure. The products of the multiplexed LMA reaction constitute the 5C library. The composition of the 5C library is determined using high-throughput DNA sequencing. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Richard Sandstrom mailto:sull@u.washington.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track contains chromatin interaction data from the University of Washington ENCODE group generated using 5C (Chromatin Conformation Capture Carbon Copy). The 5C method is used here to define short and long-range range interactions between transcription start sites (TSS) and DNaseI hypersensitive sites (DHS) or other genomic features. The 5C method is summarized below. Transcription factors bind to promoter-associated proteins, bringing the associated DNA sequences in close proximity to each other. Cross linking the DNA and proteins immobilizes these interactions and thus maintains their close proximity. Cleavage of the sample with restriction endonuclease followed by ligation results in hybrid molecules where a fragment with a regulatory element is physically associated with a fragment containing a TSS. The interactions are then detected by oligonucleotide-dependent, ligation-mediated assays, where one set of primers is complementary to the end of fragments with a TSS and the second set of primers are complementary to fragments with a feature. Primers are designed to the forward strand of the feature and the reverse strand of the TSS so that ligation only occurs between TSS and feature, not between different features. Specific interactions are detected by massively parallel sequencing. The data in this track comprises two different experiment types focusing on targeted regions: Gene-targeted project: Analysis of DNase I hypersensitive sites reveals many genes where there are multiple sites restricted to the cell type where a protein is observed to be expressed. These sites potentially identify regulatory sites for the gene. This set of experiments attempts to observe interactions between these DHS sites and transcription starts in 25 regions selected based on genes expressed in GM06990 (B-lymphocyte), BJ (foreskin fibroblast), HepG2 (liver cancer cell line), or SK-N-SH_RA (neuroblastoma cell line, SKNSH, differentiated with retinoic acid). Myc project: Genome wide association studies have identified SNPs linked to prostate, colon, and breast cancer in the gene desert region upstream of the myc gene. 5C of HindIII fragments interacting with those containing refSeq txStarts in this region were performed in 5 cell types: GM12878 (B-lymphocyte), CaCo2 (colon cancer cell line), LNCaP (prostate cancer cell line), MCF7 (breast cancer cell line), and K562 (erythroleukemia cell line). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Yijun Ruan mailto:ruanyj@gis.a-star.edu.sg). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track was produced as part of the ENCODE Project. It shows the locations of protein factor mediated chromatin interactions determined by Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) data (Fullwood et al., 2010) extracted from five different human cancer cell lines (K562 (chronic myeloid leukemia), HCT116 (colorectal cancer), HeLa-S3 (cervical cancer), MCF-7 (breast cancer), and NB4 (promyelocytic)). A chromatin interaction is defined as the association of two regions of the genome that are far apart in terms of genomic distance, but are spatially proximate to each other in the 3-dimensional cellular nucleus. Additionally, ChIA-PET experiments generate transcription factor binding sites. A binding site is defined as a region of the genome that is highly enriched by specific Chromatin ImmunoPrecipitation (ChIP) against a transcription factor, which indicates that the transcription factor binds specifically to this region. The protein factors displayed in the track include estrogen receptor alpha (ERa), RNA polymerase II (RNAPII), and CCCTC binding factor (CTCF). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Chromatin interaction analysis with paired-end tag sequencing (ChIA-PET) is a global de novo high-throughput method for characterizing the 3-dimensional structure of chromatin in the nucleus. In the ChIA-PET protocol, samples were cross-linked and fragmented, then subjected to chromatin immunoprecipitation. The DNA fragments that were brought together by the chromatin interactions were then proximity-ligated. During this proximity-ligation step, the half-linkers (created by the fragmentation) containing flanking MmeI sites (type IIS restriction enzymes) were first ligated to the DNA fragments and then ligated to each other to form full linkers. Full linkers bridge either two ends of a self-circularized fragment, or two ends of two different chromatin fragments. The material was then reverse cross-linked, purified and digested with MmeI. MmeI cuts 20 base pairs away from its recognition site. Tag-linker-tag (paired-end tag, PET) constructs were sequenced by ultra-high-throughput methods (Illumina or SOLiD paired-end sequencing). ChIA-PET reads were processed with the ChIA-PET Tool (Li et al., 2010) by the following steps: linker filtering, short reads mapping, PET classification, binding site identification, and interaction cluster identification. The high-confidence binding sites and chromatin interaction clusters were reported. Chromatin interactions identified by ChIA-PET have been validated by 3C, ChIP-3C, 4C and DNA-FISH (Fullwood et al., 2009).

Dataset Information

Chromatin Interactions by 5C from ENCODE/Dekker Univ. Mass.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets