Project description:A cDNA library was constructed by Novogene (CA, USA) using a Small RNA Sample Pre Kit, and Illumina sequencing was conducted according to company workflow, using 20 million reads. Raw data were filtered for quality as determined by reads with a quality score > 5, reads containing N < 10%, no 5' primer contaminants, and reads with a 3' primer and insert tag. The 3' primer sequence was trimmed and reads with a poly A/T/G/C were removed
Project description:We use nucleosome maps obtained by high-throughput sequencing to study sequence specificity of intrinsic histone-DNA interactions. In contrast with previous approaches, we employ an analogy between a classical one-dimensional fluid of finite-size particles in an arbitrary external potential and arrays of DNA-bound histone octamers. We derive an analytical solution to infer free energies of nucleosome formation directly from nucleosome occupancies measured in high-throughput experiments. The sequence-specific part of free energies is then captured by fitting them to a sum of energies assigned to individual nucleotide motifs. We have developed hierarchical models of increasing complexity and spatial resolution, establishing that nucleosome occupancies can be explained by systematic differences in mono- and dinucleotide content between nucleosomal and linker DNA sequences, with periodic dinucleotide distributions and longer sequence motifs playing a secondary role. Furthermore, similar sequence signatures are exhibited by control experiments in which genomic DNA is either sonicated or digested with micrococcal nuclease in the absence of nucleosomes, making it possible that current predictions based on highthroughput nucleosome positioning maps are biased by experimental artifacts. Included are raw (eland) and mapped (wig) reads. The mapped reads are provided in eland and wiggle formats, and the raw reads are included in the eland file. This series includes only Mnase control data. The sonicated control is part of this already published accession, as is a in vitro nucleosome map: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15188 We also studied data (in vitro and in vivo maps as well as a model) from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13622 and from: http://www.ncbi.nlm.nih.gov/sra/?term=SRA001023
Project description:Deep Sequencing of protein-coding and non-protein-coding RNAs from mouse differentiated embryonic stem cells and 14.5dpc mouse fetal head Analysis of Ribominus RNA from mouse differentiated embryonic stem cells and 14.5dpc mouse fetal head There are no processed data files for GSM566806-GSM566811. There are no fastq raw data files for GSM566812 and GSM566813 since these samples are the combined reads from all sequence lanes.
Project description:ChIP-seq and input sequence data used in the development and evaluation of the BEADS normalization method. Examination of ChIP and input sequence reads across the worm genome
Project description:Whole exome sequencing of 5 HCLc tumor-germline pairs. Genomic DNA from HCLc tumor cells and T-cells for germline was used. Whole exome enrichment was performed with either Agilent SureSelect (50Mb, samples S3G/T, S5G/T, S9G/T) or Roche Nimblegen (44.1Mb, samples S4G/T and S6G/T). The resulting exome libraries were sequenced on the Illumina HiSeq platform with paired-end 100bp reads to an average depth of 120-134x. Bam files were generated using NovoalignMPI (v3.0) to align the raw fastq files to the reference genome sequence (hg19) and picard tools (v1.34) to flag duplicate reads (optical or pcr), unmapped reads, reads mapping to more than one location, and reads failing vendor QC.