Project description:The purpose of this work was to describe a computational and analytical methodology for profiling small RNA by high-throughput sequencing. The datasets here were used to develop synthetic oligoribonucleotides as spike-in standards.
Project description:The purpose of this work was to describe a computational and analytical methodology for profiling small RNA by high-throughput sequencing. The datasets here were used to develop synthetic oligoribonucleotides as spike-in standards. We assessed the use of synthetic oligoribonucleotide standards as spike-in controls. These standards can be used to set an objective standard against which to compare samples. Standards were added to the total RNA (100 ug) in the following amounts: Std2 (TATATGCAAGTCCGGCCATAC) 0.01 pmol, Std3 (TAGCTAACGCATATCCGCATC) 0.1 pmol, Std6 (TGAAGCTGACATCGGTCATCC) 1.0 pmol.
Project description:The phi X 174 bacteriophage was first sequenced in 1977, and has since become the most widely used standard in molecular biology and next-generation sequencing. However, with the advent of affordable DNA synthesis and de novo gene design, we considered whether we could engineer a synthetic genome, termed SynX, specifically tailored for use as a universal molecular standard. The SynX genome encodes 21 synthetic genes that can be in vitro transcribed to generate matched mRNA controls, and in vitro translated to generate matched protein controls. This enables the use of SynX as a matched control to compare across genomic, transcriptomic and proteomic experiments. The synthetic genes provide qualitative controls that measure sequencing accuracy across k-mers, GC-rich and repeat sequences, as well as act as quantitative controls that measure sensitivity and quantitative accuracy. We show how the SynX genome can measure DNA sequencing, evaluate gene expression in RNA sequencing experiments, or quantify proteins in mass spectrometry. Unlike previous spike-in controls, the SynX DNA, RNA and protein controls can be independently and sustainably prepared by recipient laboratories using common molecular biology techniques, and widely shared as a universal molecular standard.
Project description:While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a training data set, which is typically either from a part of the data set being M-bM-^@M-^\recalibratedM-bM-^@M-^] (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 units, and by as much as 13 units M-BM- at CpG sites. In addition, since reads mapping to the genome are not used for recalibration, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. Four human RNA samples with equimolar ERCC spike-in standards were sequenced on Illumina. Two human brain/liver/muscle RNA mixtures with dynamic range of ERCC spike-in standards were sequenced on SOLiD.
Project description:While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a training data set, which is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 units, and by as much as 13 units at CpG sites. In addition, since reads mapping to the genome are not used for recalibration, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration.
2012-03-03 | GSE36217 | GEO
Project description:Sequencing of a synthetic spike-in control with complex variants
Project description:The biological functions of circadian clock on growth and development have been well elucidated in model plants, while its regulatory roles in crop species, especially the roles on yield-related traits are poorly understood. Here, we characterize the core clock gene CCA1 homoeologs in wheat and studied their biological functions in seedling growth and spike development. TaCCA1 homoeologs exhibit typical diurnal expression patterns which are positively regulated by rhythmic histone modifications (H3K4me3, H3K9ac and H3k36me3). TaCCA1s are preferentially located in the nucleus and tend to form both homo- and heterodimers. TaCCA1 overexpression (TaCCA1-OE) transgenic wheat plants show disrupted circadian rhythmicity coupling with reduced chlorophyll and starch content, as well as biomass at seedling stage, also decreased spike length, grain number per spike and grain size at the ripening stage. Further studies using DNA affinity purification followed by deep sequencing (DAP-seq) indicates that TaCCA1 preferentially binds to sequences similar to “evening elements” (EE) motif in the wheat genome, particularly genes associated with photosynthesis, carbon utilization and auxin homeostasis, and decreased transcriptional levels of these target genes are observed in TaCCA1-OE transgenic wheat plants. Collectively, our study provides novel insights into a circadian-mediated mechanism of gene regulation to coordinate photo synthetic and metabolic activities in wheat, which is important for optimal plant growth and crop yield formation.