Project description:we mapped the locations of DNA segments occupied by GATA1 using chromatin immunoprecipitation (ChIP). We have produced genome-wide GATA1 ChIP datasets after restoration and activation in G1E-ER4 cells. we employed the sequence census methodology of ChIP-seq , using Illumina GA2 technology to produce 23 million reads (36 nucleotides long) uniquely mapped to the mouse genome (mm8 assembly) for the GATA1 ChIP DNA and 15 million mapped reads for the input DNA Examination of transcription factor GATA1 occupancy
Project description:we mapped the locations of DNA segments occupied by GATA1 using chromatin immunoprecipitation (ChIP). We have produced genome-wide GATA1 ChIP datasets after restoration and activation in G1E-ER4 cells. we employed the sequence census methodology of ChIP-seq , using Illumina GA2 technology to produce 23 million reads (36 nucleotides long) uniquely mapped to the mouse genome (mm8 assembly) for the GATA1 ChIP DNA and 15 million mapped reads for the input DNA
Project description:The high level of human genome structural variation among individuals suggests that there must be portions of the genome that have yet to be discovered, annotated and characterized at the sequence level. Using clone resources developed as part of the Human Genome Structural Variation Sequencing Project, we focused on the characterization of 2,363 novel sequence contigs not present in the human reference genome. We determined that these contigs corresponded to 720 distinct loci of which 400 now have an anchored position in the reference genome. We investigated the sequence properties of these loci and determined that 37% of these novel insertions are copy-number polymorphic. We find that they are significantly enriched within the last 5 Mb of chromosomes (a 2.9-fold enrichment, p=1.0e-18, binomial test) and that most arose as a result of deletions in the human lineage after separation from the African great apes. A subset of these sites shows evidence of marked population stratification among Asian, African and European populations, including a 3.9-kb insertion within the first intron of the lactase gene. Complete sequencing of clones from 192 genomic loci, including 156 completely spanned insertions, provides a detailed and contextual view of 1.67 Mb of inserted sequence. Analysis of this sequence identified 477 elements that show evidence of sequence constraint over evolutionary time, as well as matches to 22 RefSeq gene segments. Twenty-six of the insertions contain matches against mRNA-seq data indicating the potential presence of functionally important, unannotated human sequences. Taking advantage of this high-quality sequence, we develop a method to accurately genotype these novel insertions using next-generation whole-genome sequencing datasets.
Project description:The pluripotent state of embryonic stem cells (ESCs) is produced by active transcription of cell identity genes and repression of genes encoding lineage-specifying developmental regulators. Here we use large ESC cohesin ChIA-PET datasets and other genomic data to identify the local chromosomal structures at both active and repressed genes across the genome. The results show that super-enhancer driven cell identity genes generally occur within large loops that are connected through CTCF-CTCF interaction sites occupied by cohesin. H3K27me3 ChIP-seq data from wild type murine embryonic stem cells V6.5 were generated by deep sequencing using Illumina Hi-Seq 2000.
Project description:This data was produced by the Wold lab at Caltech as part of the ENCODE Project. RNA-Seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-Seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing. The resulting sequence reads are then informatically mapped onto the genome sequence. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf