Project description:The pluripotent state of embryonic stem cells (ESCs) is produced by active transcription of cell identity genes and repression of genes encoding lineage-specifying developmental regulators. Here we use large ESC cohesin ChIA-PET datasets and other genomic data to identify the local chromosomal structures at both active and repressed genes across the genome. The results show that super-enhancer driven cell identity genes generally occur within large loops that are connected through CTCF-CTCF interaction sites occupied by cohesin. Smc1 ChIA-PET data from wild type murine embryonic stem cells V6.5 were generated by deep sequencing using Illumina Hi-Seq 2000.
Project description:Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. Genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs. The current application of PET sequencing with short insert size DNA is insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We have developed a robust procedure to generate PET sequencing data using large DNA inserts of 10 - 20 kb for the identification of SVs. We compared the characteristics of the large insert libraries with short insert (1 kb) libraries with the same sequencing depths and costs. Although short insert libraries bear an advantage in identifying small deletions, they do not provide a significantly better breakpoint resolution. Large inserts are superior to short inserts in providing higher physical genome coverage and therefore achieve greater sensitivity for the identification of the different types of SVs, including copy number neutral and complex events. Further, large inserts allow the identification of SVs within repetitive sequences which cannot be spanned by short inserts. Structural variations of three cancer cell lines using short (1 kb) and long (10 kb and 20 kb) insert size DNA fragments