Project description:One clear hallmark of mammalian promoters is the presence of CpG islands (CGIs) at more than two thirds of genes whereas TATA boxes are only present at a minority of promoters. Using genome-wide approaches, we show that GC content and CGIs are major promoter elements in mammalian cells, able to govern open chromatin conformation and support paused transcription. First, we define three classes of promoters with distinct transcriptional directionality and pausing properties which correlate with their GC content. We further analyze the direct influence of GC content on nucleosome positioning and depletion, and show that CGIs correlate with nucleosome depletion both in vivo and in vitro. We also show that transcription is not essential for nucleosome exclusion but influences both a weak +1 and a well-positioned nucleosome at CGI borders. Altogether our data support the idea that CGIs have become an essential feature of promoter structure defining novel regulatory properties in mammals. Nucleosome density and positioning were studied by high-throughput sequencing of DNA previously treated with Mnase. In parallel, chIPseq for PolII and H3K27ac were performed in mouse and human with different conditions to assess a potential effect of transcription on nucleosomes properties. Sonicated genomic DNA has been sequenced to quantify and exclude sequencing bias in the largest CGIs regions studied in this article.
Project description:One clear hallmark of mammalian promoters is the presence of CpG islands (CGIs) at more than two thirds of genes whereas TATA boxes are only present at a minority of promoters. Using genome-wide approaches, we show that GC content and CGIs are major promoter elements in mammalian cells, able to govern open chromatin conformation and support paused transcription. First, we define three classes of promoters with distinct transcriptional directionality and pausing properties which correlate with their GC content. We further analyze the direct influence of GC content on nucleosome positioning and depletion, and show that CGIs correlate with nucleosome depletion both in vivo and in vitro. We also show that transcription is not essential for nucleosome exclusion but influences both a weak +1 and a well-positioned nucleosome at CGI borders. Altogether our data support the idea that CGIs have become an essential feature of promoter structure defining novel regulatory properties in mammals. Nucleosome density and positioning were studied by high-throughput sequencing of DNA previously treated with Mnase. In parallel, ChIP-seq for PolII and H3K27ac were performed in mouse and human with different conditions to assess a potential effect of transcription on nucleosomes properties. To exclude a possible MNase digestion bias or sequencing artefact within CGIs, we performed additional controls such as nucleosome mapping using an alternative method based on DNA digestion by a chemical agent (phenanthroline) in human Raji B-cells.
Project description:To understand the biosynthesis of C. majus BIAs, we performed de novo transcriptome sequencing of leaf and root tissues of C. majus using Illumina high-throughput sequencing technology.
2018-07-20 | GSE117393 | GEO
Project description:GC Bias on the SOLiD5500xl system
Project description:We present GCparagon, a two-stage algorithm for computing and correcting GC biases in cell-free DNA (cfDNA) samples. The length of the highly fragmented cfDNAs and the number of GC bases are essential parameters in the calculations. Regions of low mappability, known reference genome assembly errors and regions surrounding assembly gaps are excluded from the bias computation. GCparagon outputs a bias matrix and an optional tagged BAM file with GC bias balance weights as alignment tags. Parallelization allows calculation of a GC bias estimate in less than 2 minutes per sample with between 99.0% and 99.9% of fragments already corrected. We propose that GCparagon can help standardize cfDNA applications and evaluate the impact of GC bias on algorithms used in the analysis of liquid biopsy data.
Project description:To characterize the rules governing exon recognition during splicing, we analyzed RNA-seq datasets and identified ~4,000 GC-rich and ~5,000 AT-rich exons, labelled GC-exons and AT-exons, respectively whose inclusion depends on different sets of splicing factors. We show that a high GC-load is associated with predicted RNA secondary structures at 5'ss and that GC-exons are dependent on U1 snRNP-associated proteins. Meanwhile, a high AT-load is associated with a large number of decoy splicing-related signals upstream exons such as the number of branchpoints and SF1- or U2AF65-binding sites and AT-exons are dependent on U2 snRNP-associated proteins. Nucleotide composition bias also influences local chromatin organization. Since the GC content of exons correlates with that of their hosting-genes, -isochores and – topologically-associated domains, we propose that regional nucleotide composition bias leaves a footprint locally, at the exon level, inducing, during splicing, constraints that are alleviated by the local chromatin organization and specific splicing factors.
Project description:Coupling molecular biology to high throughput sequencing has revolutionized the study of biology. Molecular genomics techniques are continually refined to provide higher resolution mapping of nucleic acid interactions and nucleic acid structure. These assays are converging on single-nucleotide resolution measurements, but the sequence preferences of molecular biology enzymes can interfere with the accurate interpretation of the data. Enzymatic sequence preferences manifest more prominently as the resolution of these assays increase. We developed seqOutBias to seek out enzymatic sequence bias from experimental data and scale individual sequence reads to correct the bias. We show that this software efficiently and successfully corrects the sequence bias resulting from DNase-seq, TACh-seq, ATAC-seq, MNase-seq, and PRO-seq data.