Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

Histone Modifications by ChIP-seq from ENCODE/Broad Institute

ABSTRACT: This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (mailto:nshoresh@broad.mit.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track displays maps of chromatin state generated by the Broad/MGH ENCODE group using ChIP-seq. Chemical modifications (methylation, acetylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. The ChIP-seq method involves first using formaldehyde to cross-link histones and other DNA-associated proteins to genomic DNA within cells. The cross-linked chromatin is subsequently extracted, mechanically sheared, and immunoprecipitated using specific antibodies. After reversal of cross-links, the immunoprecipitated DNA is sequenced and mapped to the human reference genome. The relative enrichment of each antibody-target (epitope) across the genome is inferred from the density of mapped fragments. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf ChIP-seq: Cells were grown according to the approved ENCODE cell culture protocols. Cells were fixed in 1% formaldehyde and resuspended in lysis buffer. Chromatin was sheared to 200-700 bp using a Diagenode Bioruptor. Solubilized chromatin was immunoprecipitated with antibodies against each of the histone antibodies listed above. Antibody-chromatin complexes were pulled-down using protein A-sepharose (or anti-IgM-conjugated agarose for RNA polymerase II), washed and then eluted. After cross-link reversal and proteinase K treatment, immunoprecipitated DNA was extracted with phenol-chloroform, ethanol precipitated, treated with RNAse and purified. One to ten nanograms of DNA were end-repaired, adapter-ligated and sequenced by Illumina Genome Analyzers as recommended by the manufacturer. Alignment: Sequence reads from each IP experiment were aligned to the human reference genome (GRCh37/hg19) using MAQ with default parameters, except '-C 11' and '-H output_file', which outputs up to 11 additional best matches for each read (if any are found) to a file. This information was used to filter out any read that had more than 10 best matches on the genome. Note: It is likely that instances where multiple reads align to the same position and with the same orientation are due to enhanced PCR amplification of a single DNA fragment. No attempt has been made, however, to remove such artifacts from the data, following ENCODE practices. Signal: Fragment densities were computed by counting the number of reads overlapping each 25 bp bin along the genome. Densities were computed using igvtools count with default parameters (in particular, '-w 25' to set window size of 25 bp and '-f mean' to report the mean value across the window), except for '-e' set to extend the reads to 200 bp, and the .wig output was converted to bigWig using wigToBigWig from the UCSC Kent software package. Peaks: Discrete intervals of ChIP-seq fragment enrichment were identified using Scripture, a scan statistics approach, under the assumption of uniform background signal. All data sets where processed with '-task chip', and with '-windows 100,200,500,1000,5000,10000,100000'. (No mask file nor the '-trim' option have been used.) The resulting called segments were then further filtered to remove intervals that are significantly enriched only because they contain smaller enriched intervals within them. This post-processing step has been implemented using Matlab. The use of the post-processing step allowed very large enriched intervals (of the order of Mbps for H3K27me3, for instance) to be detected, as well as much smaller intervals, without the need to tailor the parameters of Scripture based on prior expectations.

ORGANISM(S): Homo sapiens

SUBMITTER: ENCODE DCC

PROVIDER: E-GEOD-29611 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Similar Datasets

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Peggy Farnham mailto:pfarnham@usc.edu for questions concerning data collection and usage and Philip Cayting mailto:pcayting@stanford.edu for data scoring and submission inquiries). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the ENCODE Project, displays maps of histone modifications genome-wide using ChIP-seq in different cell lines. The ChIP-seq method involves first using formaldehyde to cross-link histones and other DNA-associated proteins to genomic DNA within cells. The cross-linked chromatin is subsequently extracted, sheared, and immunoprecipitated using specific antibodies. After reversal of cross-links, the immunoprecipitated DNA is sequenced and mapped to the human reference genome. The relative enrichment of each antibody-target (epitope) across the genome is inferred from the density of mapped fragments. Chemical modifications (e.g. methylation or acetylation) of the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription factors. Shown for each experiment (defined as a particular antibody and a particular cell type) is a track of enrichment for the specifically modified histone (Signal), along with sites that have the greatest enrichment (Peaks). Also included for each cell type is the input signal, which represents the control condition where no antibody targeting was performed. In general the following chemical modifications have associated genetic phenotypes: H3K4me3 and H3K9Ac are considered to be marks of active or potentially active promoter regions. H3K4me1 and H3K27Ac are considered to be marks of active or potentially active enhancer regions. H3K36me3 and H3K79me2 are considered to be marks of transcriptional elongation. H3K27me3 and H3K9me3 are considered to be marks of inactive regions. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Briefly, cells were crosslinked, chromatin was extracted and sonicated using a Bioruptor sonicator (Diagenode) to an average size of 300-500bp, and individual ChIP assays were performed using antibodies to modified histones. For the K562 and Ntera2 histone ChIP-seq samples, immunoprecipitates were collected using protein G-coupled magnetic beads; a detailed ChIP and library protocol can be found at http://www.roadmapepigenomics.org/protocols. For the U2OS histone ChIP-seq samples, immunoprecipitates were collected using StaphA cells; a detailed protocol can be found at http://expression.genomecenter.ucdavis.edu/chip.html. Library DNA was quantitated using either a Nanodrop or a BioAnalyzer and sequenced on an Illumina GA2. The sequencing reads were mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags, a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome. For each 1 Mb segment of each chromosome, a peak height threshold was determined by requiring a false discovery rate <= 0.05 when comparing the number of peaks above threshold as compared to the number obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.05 are considered to be significantly enriched compared to the input DNA control.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:fpauli@hudsonalpha.org). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track is produced as part of the ENCODE project. The track displays the methylation status of specific CpG dinucleotides in the given cell types as identified by the Illumina Infinium HumanMethylation27 BeadArray platform (http://www.illumina.com/pages.ilmn?ID=243). In general, methylation of CpG sites within a promoter causes silencing of the gene associated with that promoter. Detailed information for the CpG targets is in an XLS formatted spreadsheet on the Myers' lab protocols website (http://hudsonalpha.org/myers-lab/protocols). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). Genomic DNA was isolated from each cell line with the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations and a level of quality of each preparation was determined by fluorescence with the Qubit Fluorometer (Invitrogen). The Methyl27K platform uses bisulfite treated genomic DNA to assay the methylation status of 27,578 CpG sites within more than 14,000 genes. When genomic DNA is treated with sodium bisulfite, unmethylated cytosine of CpG dinucleotides are converted into uracils; methylated cytosines do not get converted. After bisulfite treatment, the methylation status of a site is assayed by single base-pair extension with a Cy3 or Cy5 labeled nucleotide on oligo-beads specific for the methylated or unmethylated state. A beta value is calculated by Illumina's Bead Studio software for each CpG target. This value represents the intensity value from the methylated bead type divided by the sum of the intensity values from the methylated and unmethylated bead types for any given CpG target. Bisulfite conversion reaction was done using the Zymo Research EZ-96 DNA Methylation Kit (http://www.zymoresearch.com/epigenetics/dna-methylation/ez-96-dna-methylation-kit). One step of the protocol was modified. During the incubation, a 30 sec 95oC denaturing step every hour was included to increase reaction efficiency as recommended by the Illumina Infinium Human Methylation27 protocol. The bead arrays were run according to the protocol provided by Illumina (http://www.illumina.com/pagesnrn.ilmn?ID=275). The intensity data from the BeadArray was processed using Illumina's BeadStudio software with the Methylation Module v3.2. The data was then quality-filtered using p-values. Any beta value equal to or greater than 0.6 is considered fully methylated. Any beta value equal to or less than 0.2 is considered to be fully unmethylated. Beta values between 0.2 and 0.6 are considered to be partially methylated. Beta-values are quality filtered and spots that fall below the minimum intensity threshold are displayed as "NA". Score in the bed files is beta value x 1000

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Jonathan Preall jpreall@cshl.edu (Generation 0 Data from Hannon Lab), Carrie Davis davisc@cshl.edu (experimental), Alex Dobin dobin@cshl.edu (computational), Wei Lin wlin@cshl.edu (computational), Tom Gingeras gingeras@cshl.edu (primary investigator)). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). hg18: This data was produced by Hannon lab part of Cold Spring Harbor as part of the ENCODE Project. The series depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues or sub cellular compartments of cell lines. hg19: This track depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues or sub cellular compartments from ENCODE cell lines. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. hg19: This cloning protocol generates directional libraries that are read from the 5' ends of the inserts, which should largely correspond to the 5' ends of the mature RNAs. The libraries were sequenced on a Solexa platform for a total of 36, 50 or 76 cycles however the reads undergo post-processing resulting in trimming of their 3' ends. Consequently, the mapped read lengths are variable. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf hg18: Small RNAs between 20-200 nt were ribominus treated according to the manufacturer's protocol (Invitrogen) using custom LNA probes targeting ribosomal RNAs (some datasets are also depleted of U snRNAs and high abundant microRNAs). The RNA was treated with Tobacco Alkaline Pyrophosphatase to eliminate any 5' cap structure. Poly-A Polymerase was used to catalyze the addition of C's to the 3' end. The 5' ends were phosphorylated using T4 PNK and an RNA linker was ligated onto the 5' end. Reverse transcription was carried out using a poly-G oligo with a defined 5' extension. The inserts were then amplified using oligos targeting the 5' linker and poly-G extension and containing sequencing adapters. The library was sequenced on an Illumina GA machine for a total of 36, 50 or 76 cycles. Initially 1 lane is run. If an appreciable number of mappable reads are obtained, additional lanes are run. Sequence reads underwent quality filtration using Illumina standard pipeline (Gerlad). The read lengths may exceed the insert sizes and consequently introduce 3' adaptor sequence into the 3' end of the reads. The 3' sequencing adaptor was removed from the reads using a custom clipper program, which aligned the adaptor sequence to the short-reads, allowing up to 2 mismatches and no indels. Regions that aligned were "clipped" off from the read. The trimmed portions were collapsed into identical reads, their count noted and aligned to the human genome (NCBI build 36, hg18 unmasked) using Nexalign (Lassmann et al., not published). The alignment parameters are tuned to tolerate up to 2 mismatches with no indels and will allow for trimmed portions as small as 5 nucleotides to be mapped. We report reads that mapped 10 or fewer times. Data obtained from each lane is processed and mapped independently. The processed/mapped data from each lane is then complied as a single track without additional processing and submitted to UCSC. Consequently, identical reads within a lane were collapsed and their value is reported as the "transfrag" signal value. However, the redundancy between lanes has not been eliminated so the same transfrag may appear multiple times within a signal. hg19: Small RNAs between 20-200 nt were ribominus treated according to the manufacturer's protocol (Invitrogen) using custom LNA probes targeting ribosomal RNAs (some datasets are also depleted of U snRNAs and high abundant microRNAs). The RNA was treated with Tobacco Alkaline Pyrophosphatase to eliminate any 5' cap structures. Poly-A Polymerase was used to catalyze the addition of C's to the 3' end. The 5' ends were phosphorylated using T4 PNK and an RNA linker was ligated onto the 5' end. Reverse transcription was carried out using a poly-G oligo with a defined 5' extension. The inserts were then amplified using oligos targeting the 5' linker and poly-G extension and containing sequencing adapters. The library was sequenced on an Illumina GA machine for a total of 36, 50 or 76 cycles. Initially, one lane was run. If an appreciable number of mappable reads were obtained, additional lanes were run. Sequence reads underwent quality filtration using Illumina standard pipeline (GERALD). The Illumina reads were initially trimmed to discard any bases following a quality score less than or equal to 20 and converted into FASTA format, thereby discarding quality information for the rest of the pipeline. As a result, the sequence quality scores in the BAM output are all displayed as "40" to indicate no quality information. The read lengths may exceed the insert sizes and consequently introduce 3' adapter sequence into the 3' end of the reads. The 3' sequencing adapter was removed from the reads using a custom clipper program (available at http://hannonlab.cshl.edu/fastx_toolkit/), which aligned the adapter sequence to the short-reads using up to 2 mismatches and no indels. Regions that aligned were "clipped" off from the read. Terminal C nucleotides introduced at the 3' end of the RNA via the cloning procedure are also trimmed. The trimmed portions were collapsed into identical reads, their count noted and aligned to the human genome (version hg19, using the gender build appropriate to the sample in question - female/male) using Bowtie (Langmead B. et al). The alignment parameter allowed 0, 1, or 2 mismatches iteratively. We report reads that mapped 20 or fewer times. Discrepancies between hg18 and hg19 versions of CSHL small RNA data: The alignment pipeline for the CSHL small RNA data was updated upon the release of the human genome version hg19, resulting in a few noteworthy discrepancies with the hg18 dataset. First, mapping was conducted with the open-source Bowtie algorithm (http://bowtie-bio.sourceforge.net/index.shtml) rather than the custom NexAlign software. As each algorithm uses different strategies to perform alignments, the mapping results may vary even in genomic regions that do not differ between builds. The read processing pipeline also varies slightly, in that we no longer retain information regarding whether a read was 'clipped' off adapter sequence.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Piero Carninci mailto:carninci@riken.jp). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track shows 5' cap analysis gene expression (CAGE) tags and clusters in RNA extracts (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=rnaExtract) from different sub-cellular localizations (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=localization) in multiple cell lines (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType). A CAGE cluster is a region of overlapping tags with an assigned value that represents the expression level. The data in this track were produced as part of the ENCODE Transcriptome Project. Release 2 has three new downloads only files per experiment (Clusters, TSS Gencode 7 and TSS HMM) and four new cell lines (A459, AG04450, BJ and SK-N-SH_RA). Release 1 on hg19 contained the original data on hg18 (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeRikenCage) that was remapped and indicated in this release as Generation 0 since that data had no replicates. If there is both old and new generation data available for a particular experiment, only the new generation data is displayed and the older data is available for download. The new data for this track was done with a different process and has standard replicate numbers. The replicate labeling in the genome browser view is a counter indicating the total number of replicates submitted. The producing lab has replicate numbers that correspond to their internal bio-replicate numbering. Where these two numbering systems conflict, both are listed in the long label of the specific track. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). RNA molecules longer than 200 nt were isolated from each subcellular compartment and then were fractionated into polyA+ and polyA- fractions as described in these protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/general/rnaExtracts.txt). The CAGE tags were sequenced from the 5' ends of cap-trapped cDNAs produced using RIKEN CAGE technology (Kodzius et al. 2006; Valen et al. 2009). To create the tag, a linker was attached to the 5' end of polyA+ or polyA- reverse-transcribed cDNAs which were selected by cap trapping (Carninci et al. 1996). The first 27 bp of the cDNA were cleaved using class II restriction enzymes. A linker was then attached to the 3' end of the cDNA. After PCR amplification, the tags were sequenced (36 bp single reads) using Illumina's Genome analyzer. Tags were mapped to the human genome (hg19) using the program delve (T. Lassmann manuscript in preparation). Delve is a new probabilistic aligner focused on giving the best possible alignment of reads to a genome rather than focusing on speed. It calculates the mapping accuracy (probability of each alignment being true or not) for each alignment. There is no set limit on the number of errors allowed and therefore the mapping rate is commonly 100%. However, for analysis it is recommended to discard alignments with low mapping qualities. Exceptions to the above protocol are the polyA- RNA samples from K562 cytosol, K562 nucleus, and prostate whole cell which were sequenced using ABI SOLiD (http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html) technology. These reads were mapped using Bowtie using default parameters. Clusters were defined as regions of overlapping CAGE reads. The expression level was computed as the number of reads making up the cluster, divided by the total number of reads sequenced, times 1 million.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:fpauli@hudsonalpha.org). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track is produced as part of the ENCODE project. The track displays the methylation status of specific CpG dinucleotides in the given cell types as identified by the Illumina Infinium Human Methylation 450 Bead Array platform (http://www.illumina.com/products/methylation_450_beadchip_kits.ilmn). In general, methylation of CpG sites within a promoter causes silencing of the gene associated with that promoter. The Infinium Human Methylation 450 platform uses bisulfite treated genomic DNA to assay the methylation status of more than 450,000 CpG sites covering all designatable RefSeq genes, including promoter, 5' and 3' regions, without bias against those lacking CpG islands. Additionally, the assay includes CpG islands and shores, CpG sites outside of CpG islands, non-CpG methylated sites identified in human stem cells, differentially methylated sites identified in tumor versus normal (multiple forms of cancer) and across several tissue types, CpG islands outside of coding regions, miRNA promoter regions, and disease-associated regions identified through GWAS. Detailed information for the CpG targets is in an CSV formatted spreadsheet in the supplemental directory (http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibMethyl450/supplemental/). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). Genomic DNA was isolated from each cell line with the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations and a level of quality of each preparation was determined by fluorescence with the Qubit Fluorometer (Invitrogen). Genomic DNA was treated with sodium bisulfite, converting unmethylated cytosines of CpG dinucleotides into uracils; methylated cytosines did not get converted. After bisulfite treatment, the methylation status of a site was assayed by single base-pair extension with a Cy3 or Cy5 labeled nucleotide on oligo-beads specific for the methylated or unmethylated state. The bisulfite conversion reaction was done using the Zymo Research EZ-96 DNA Methylation Kit (http://www.zymoresearch.com/product/ez-96-dna-methylation-kit-d5003). One step of the protocol was modified. During the incubation, a 30 second 95oC denaturing step every hour was included to increase reaction efficiency as recommended by the Illumina Infinium Human Methylation27 protocol. The bead arrays were run according to the protocol provided by Illumina (http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibMethyl450/supplemental/wgEncodeHaibMethyl450IlluminaProtocol.pdf). A beta value was calculated for each CpG target with Illumina's Bead Studio software with the Methylation Module v3.2. Beta-value = intensity value from the methylated bead type/(intensity values from the methylated + intensity value from unmethylated bead types + 100). The data was then quality-filtered using p-values. Beta values with p-value greater than 0.01 are considered to fall below the minimum intensity and threshold are displayed as "NA". Any beta value equal to or greater than 0.6 was considered fully methylated. Any beta value equal to or less than 0.2 was considered to be fully unmethylated. Beta values between 0.2 and 0.6 were considered to be partially methylated. Score in the bed files is beta value x 1000

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Richard Sandstrom mailto:sull@u.washington.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. Sequencing these next-generation-sequencing DNase samples to significantly higher depths of 300-fold or greater produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the GRCh37/hg19 human genome using Bowtie 0.12.5 (Eland was used to map to NCBI36/hg18); only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). False discovery rate thresholds of 1.0% (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 1.0% (FDR 0.01) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%) were designated for deep sequencing to a depth of over 200 million tags.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Kevin White mailto:kpwhite@uchicago.edu (Principal Investigator), Subhradip Karmakar mailto:subhradip@uchicago.edu (Project Lead), Nick Bild mailto:nbild@bsd.uchicago.edu (Data Analyst), Alina Choudhury mailto:achoudhury@uchicago.edu (Laboratory Technician), Marc Domanus mailto:mdomanus@anl.gov (Sequencing Technician at Argonne National Lab)). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This ENCODE track maps human transcription factor binding sites, genome-wide using second generation massively parallel sequencing. This mapping uses expressed transcription factors as GFP tagged fusion proteins after BAC (Bacterial artificial chromosomes) recombineering. The U. of Chicago and Max Planck Institute (Dresden) pipeline generates recombineered (recombination-mediated genetic engineering) BACs for the production of cell lines or animals that express fusion proteins from epitope tagged transgenes. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. (http://hgwdev/ENCODE/protocols/cell) Recombineering strategy: To facilitate high-throughput production of the transgenic constructs, the program BACFinder (Crowe, Rana et al. 2002) automatically selects the most suitable BAC clone for any given human gene and generates the sets of PCR primers required for tagging and verification (Poser, Sarov et al. 2008). Recombineering is used for tagging cassettes at either the N or C terminus of the protein. The N-terminal cassette has a dual eukaryotic-prokaryotic promoter (PGK-gb2) driving a neomycin-kanamycin resistance gene within an artificial intron inside the tag coding sequence. The selection cassette is flanked by two loxP sites and can be permanently removed by Cre recombinase-mediated excision. The C-terminal cassette contains the sequence encoding the tag followed by an internal ribosome entry site (IRES) in front of the neomycin resistance gene. In addition, a short bacterial promoter (Gb3) drives the expression of the neomycin-kanamycin resistance gene in E. coli. The tagging cassettes, containing 50 nucleotides of PCR-introduced homology arms are inserted into the BAC by recombineering, either behind the start codon (for the N-terminal tag) or in front of the stop codon (for the C-terminal tag) of the gene. E. coli cells that have successfully recombined the cassette are selected for kanamycin resistance in liquid culture. Each saturated culture from a specific recombineering reaction derived 10-200 independent recombination events. Checking two independent clones for each PCR through the tag insertion point, 97% (85/88) yielded a PCR product of the expected size. Most of the clones that failed to grow were missing the targeted genomic region. An estimated 10% of the BACs used are chimeric, rearranged or wrongly mapped. Thus, initial results indicate that the necessary recombineering steps can be carried out with high fidelity. The White lab produced all epitope tagged transcription and chromatin factor BACs, as well as the genome wide ChIP data and analysis. An application of this approach to the analysis of closely related paralogs (RARa and RARg) yielded transcription factors, chromatin factors, cell lines, ChIP chip data and ChIP-seq data (Hua, Kittler et al. Cell 2009). Such paralogous transcription factors often can not otherwise be distinguished by antibodies. Sample Preparation: ChIP DNA from samples are sheared to ~800bp using a nebulizer. The ends of the DNA are polished, and two unique adapters are ligated to the fragments. Ligated fragments of 150-200bp are isolated by gel extraction and amplified using limited cycles of PCR. Sequencing System: Illumina GAIIx and HySeq next-generation sequencing produced all ChIP-seq data. Processing and Analysis Software: Raw sequencing reads are aligned using Bowtie version 0.12.5 (Langmead et al. 2009). The "-m 1" parameter is applied to suppress alignments mapping more than once in the genome. Reads are aligned to the UCSC hg19 assembly. Wiggle format signal files are generated with SPP 2.7.1 for R 2.7.1. Macs 1.3.7 is used to call peaks. The Macs parameters used vary by experiment. The White lab used goat anti-GFP antibody to perform ChIP in untagged K562 cells as a background control. The test IP was performed in the same way as the background control. Results are expressed as values of the test normalized to the background

Dataset Information

Histone Modifications by ChIP-seq from ENCODE/Broad Institute

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets