ABSTRACT: This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Rationale for the Mouse ENCODE project: Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these features are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. Such changes are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. The comparison will be used to discover which epigenetic features are conserved between mouse and human, and examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of functional DNA preserved in mammals versus function in only one species will be discovered. The results will have a significant impact on the understanding of the evolution of gene regulation. Maps of DNaseI Sensitivity: DNaseI has long been used to map general chromatin accessibility, and DNaseI hypersensitivity is a universal feature of active cis-regulatory sequences. Maps of DNaseI sensitivity measured genome-wide are generated through DNaseI digestion, addition of linkers at the sites of cleavage, and library prep followed by massively parallel short read sequencing on the Illumina GAIIx and HiSeq platforms. The sequence tags are mapped back to the mouse genome, and a graph of the smoothed kernel density of DNaseI cleavage sites is displayed as the "Signal" track. This provides a quantitative estimate of the frequency of cleavage by DNaseI in the initial digest, which in turn is related to the accessibility of the DNA in the chromatin. Segments of greatest cleavage site density represent DNase hypersensitive sites (DHSs) and are identified as peaks by the F-seq program (Boyle et al. 2008). DHSs are candidates for any cis-regulatory module, including promoters, enhancers, insulators, and novel elements. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown and harvested according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse) for G1E and G1E-ER4. DNaseI hypersensitive sites were isolated using methods called DNase-seq or DNase-chip (Song and Crawford, 2010). Briefly, cells were lysed with NP40, and intact nuclei were digested with optimal levels of DNaseI enzyme. DNaseI-digested ends were captured from three different DNase concentrations, and material was sequenced using Illumina sequencing. The read length for sequences from DNase-seq is 20 bases long due to a MmeI cutting step of the approximately 50 kb DNA fragments extracted after DNaseI digestion. Sequences from each experiment were mapped to the mouse genome (mm9 assembly) using the program Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) (Langmead et al., 2009). Reads mapping to more than one location were not removed. For such reads, only the best mapping result was used ("--best" option). Sequences from multiple lanes were combined for a single replicate and converted to the sam/bam format using SAMtools (http://samtools.sourceforge.net/). Using F-seq, the resulting digital signal was converted to a continuous wiggle track that employs a Parzen kernel density estimation to create base pair scores (Boyle et al., 2008). Discrete DNaseI HS sites (peaks) were identified from the DNase-seq F-seq density signal. Significant regions were determined by fitting the data to a gamma distribution to calculate p-values.