Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Barbara Wold mailto:woldb@caltech.edu, Georgi K. Marinov mailto:georgi@caltech.edu, Diane Trout mailto:diane@caltech.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus, we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of Occupancy by Transcription Factors Genome-wide occupancy maps of transcription factors (TFs) are generated by ChIP-seq. A ChIP-Seq experiment combines a chromatin immunoprecipitation (ChIP) experiment that enriches genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibody) with high-throughput short read sequencing of the enriched DNA fragments (Wold & Myers, 2008). Proteins are crosslinked to DNA (usually with formaldehyde), chromatin is sheared and immunoprecipitated with the antibody of interest. The immunoprecipitated material is turned into a sequencing library and sequenced. The sequencing reads are then aligned to the genome. A control sample consisting of sonicated chromatin that has not been immunoprecipitated or immunoprecipitated with a non-specific immunoglobulin is also sequenced. The ChIP and the control datasets are analyzed with a variety of software packages to identify regions occupied by the target protein. The sequencing data, alignments and analysis files for these experiments are available for download. In specific, the Ren lab examined RNA polymerase II (PolII), co-activator protein p300, the insulator protein CTCF, and two chromatin modification marks, H3K4me3 and H3K4me1, due to their demonstrated utilities in identifying promoters, enhancers and insulator elements (Barski et al., 2007; Blow et al., 2010; Heintzman et al., 2009; Kim et al., 2007; Kim et al., 2005a; Visel et al., 2009). Enrichment of H3K4me3 or PolII signals is a strong indicator of an active promoter, while the presence of p300 or H3K4me1 outside of promoter regions has been used as a mark for enhancers. CTCF binding sites are considered as a mark for potential insulator elements. For each transcription factor or chromatin mark in each tissue, ChIP-seq was carried out with at least two biological replicates. Each experiment produced 20-30 million monoclonal, uniquely mapped tags. Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of histone modifications Levels of three histone modifications are being determined. H3K4me1 (monomethylation of lysine 4 of histone H3) is a mark for active chromatin and in the absence of H3K4me3, it is one indicator of an enhancer. H3K4me3 (trimethylation of lysine 4 of histone H3) is highly enriched at active promoters. One repressive (Polycomb) mark, H3K27me3, is associated with some silenced genes. Maps of genomic DNA in chromatin with these histone modifications are generated by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for histone modifications are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of Occupancy by Transcription Factors Maps of occupancy of genomic DNA by transcription factors (TFs) are determined by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for TF binding are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf
2012-04-19 | GSE36023 | GEO