Project description:The most widely-used method for detecting genome-wide protein-DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Here, we conducted the first objective analysis of tiling array platforms and analysis algorithms in a simulated ChIP-chip experiment. Mixtures of human genomic DNA and "spike-ins" comprised of nearly 100 human sequences at various concentrations were hybridized to four tiling array platforms by eight independent groups. Blind to the number of spike-ins, their locations, and the range of concentrations, each group made predictions of the spike-in locations. All commercial tiling array platforms performed well, although each platform and analysis algorithm had distinct performance and cost characteristics. Simple sequence repeats and genome redundancy tend to result in false positives on oligonucleotide platforms. The spike-in DNA samples and the resulting array data presented here provide a stable benchmark against which future ChIP platforms, protocol improvements, and analysis methods can be evaluated. Keywords: chip-ChIP simulation For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf
Project description:The LRGASP challenge encompasses different human, mouse, and manatee samples sequenced using multiple combinations of protocols and platforms. Different challenges will use distinct subsets of the samples for evaluation. The long-read sequencing platforms used in these challenges are the Pacific Biosciences (PacBio) Sequel II, Oxford Nanopore (ONT) MinION and PromethION. Samples will also be sequenced on the Illumina HiSeq 2500. The primary LRGASP library prep protocols are “standard” cDNA sequencing, direct RNA sequencing, R2C2, and CapTrap. Each sample will also include Lexogen SIRV-Set 4 spike-ins. We will also provide simulated PacBio and ONT data as part of the evaluations. This particular study focuses on single strand CAGE sequencing of human iPSCs, defining CAGE peaks from Illumina HiSeq 2500 (SR: 150 cycles) of two biological replicates for use in the LRGASP challenge.
Project description:Difference in RNA content of different cell types introduces bias to gene expression deconvolution methods. If ERCC spike-ins are introduced into samples, predicted proportions of deconvolution methods can be corrected
Project description:A bead supsension and a solution of ERCC spike-ins at a concentration of ~100,000 molecules per droplet was used in Drop-Seq, a novel technology for high-throughput single cell mRNAseq
Project description:Normalization of high-throughput small RNA sequencing (sRNA-Seq) data is required to compare sRNA levels across different samples. Commonly used relative normalization approaches can cause erroneous conclusions due to fluctuating small RNA populations between tissues. We developed a set of sRNA spike-in oligonucleotides (sRNA spike-ins) that enable absolute normalization of sRNA-Seq data across independent experiments, as well as the genome-wide estimation of sRNA:mRNA stoichiometries when used together with mRNA spike-in oligonucleotides.
Project description:Individual HEK cells were dispensed using an F.SIGHT into individual wells while recording cell diameters. Each well contained 0.0321 pg of molecular spike-ins, a highly complex set of 264 molecular spikes, based on 11 unique spike sequences spanning different lengths (570 to 3070 nts) and GC contents (40-60%). Libraries were generated with Smart-seq3xpress protocol.
Project description:Sequencing was performed to assess the ability of Nanopore direct cDNA and native RNA sequencing to characterise human transcriptomes. Total RNA was extracted from either HAP1 or HEK293 cells, and the polyA+ fraction isolated using oligodT dynabeads. Libraries were prepared using Oxford Nanopore Technologies (ONT) kits according to manufacturers instructions. Samples were then sequenced on ONT R9.4 flow cells to generate fast5 raw reads in the ONT MinKNOW software. Fast5 reads were then base-called using the ONT Albacore software to generate Fastq reads.
Project description:A highly complex set of 264 molecular spikes, based on 11 unique spike sequences spanning different lengths (570 to 3070 nts) and GC contents (40-60%) was designed. In order to be able to precisely evaluate quantification over different expression levels, transcript lengths and GC contents, barcodes of 7 nucleotides in 2-fold abundance steps were cloned into each spike sequence (12 steps in duplicates; 24 barcodes per sequence) creating a standard curve for each spike sequence. To determine the molecular abundance of each of the 264 molecular spike-ins (i.e., the ‘ground truth’), we performed an exhaustive sequencing across the spike barcodes and spUMIs and determined the total complexity in the pool to be 76 million unique molecules