Project description:Spot intensity serves as a proxy for gene expression in dual-label microarray experiments. Dye bias is defined as an intensity difference between samples labeled with different dyes attributable to the dyes instead of the gene expression in the samples. Dye bias that is not removed by array normalization can introduce bias into comparisons between samples of interest. But if the bias is consistent across the samples for the same gene, it can be corrected by proper experimental design and analysis. If the dye bias is not consistent across samples for the same gene, but is different for different samples, then removing the bias becomes more problematic, perhaps indicating a technical limitation to the ability of fluorescent signals to accurately represent gene expression. Thus, it is important to characterize dye bias to determine: (1) whether it will be removed for all genes by array normalization, (2) whether it will not be removed by normalization but can be removed by proper experimental design and analysis and (3) whether dye bias correction is more problematic than either of these and is not easily removable. Keywords: dye swap design
Project description:Coupling molecular biology to high throughput sequencing has revolutionized the study of biology. Molecular genomics techniques are continually refined to provide higher resolution mapping of nucleic acid interactions and nucleic acid structure. These assays are converging on single-nucleotide resolution measurements, but the sequence preferences of molecular biology enzymes can interfere with the accurate interpretation of the data. Enzymatic sequence preferences manifest more prominently as the resolution of these assays increase. We developed seqOutBias to seek out enzymatic sequence bias from experimental data and scale individual sequence reads to correct the bias. We show that this software efficiently and successfully corrects the sequence bias resulting from DNase-seq, TACh-seq, ATAC-seq, MNase-seq, and PRO-seq data.
Project description:Spot intensity serves as a proxy for gene expression in dual-label microarray experiments. Dye bias is defined as an intensity difference between samples labeled with different dyes attributable to the dyes instead of the gene expression in the samples. Dye bias that is not removed by array normalization can introduce bias into comparisons between samples of interest. But if the bias is consistent across the samples for the same gene, it can be corrected by proper experimental design and analysis. If the dye bias is not consistent across samples for the same gene, but is different for different samples, then removing the bias becomes more problematic, perhaps indicating a technical limitation to the ability of fluorescent signals to accurately represent gene expression. Thus, it is important to characterize dye bias to determine: (1) whether it will be removed for all genes by array normalization, (2) whether it will not be removed by normalization but can be removed by proper experimental design and analysis and (3) whether dye bias correction is more problematic than either of these and is not easily removable. For two dual-label experiments, one with cDNA arrays and the other with printed oligonucleotide arrays, Stratagene universal human reference RNA was used as a standard for testing with RNA from cell lines MCF10a, LNCAP, L428, SUDHL, OCILY3 and Jurkat. All arrays were dye-swapped at least twice. There were a total of 28 cDNA arrays and 30 oligonucleotide arrays.
Project description:We report widespread ChIP-seq bias at highly expressed genes in yeast that could lead to misinterpretation ChIP-seq for multiple transcription or chromatin-associated factors and negative controls
Project description:The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers is hindered by high variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in library amplification steps and sequencing depth variation. Our analytical contributions are the description of the Linear Quadratic (LQ) relation between the mean and variance of the sequence counts in an RNA-seq experiment and the derivation of the Poisson truncated mixture as the underlying probability distribution for RNA-seq data. Using a large number of sequencing datasets, we demonstrate here how one can use this modeling framework to calculate empirical correction factors for ligase bias, while accounting for random variation in sequence counts. Bias correction may remove the majority of bias in the absence of differential expression and more than 40% of the bias in the presence of variable expression of miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition.
Project description:DNase I footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. A number of computational approaches have been developed to accurately identify DNase-seq footprints and these methods have been used as a predictor of TF-DNA interactions by itself or in combination with other epigenetic features. However, recent studies have pointed to a substantial cleavage bias of DNase and its impact on footprinting, casting doubts on its predictive performance. To assess the potential for using DNaseI to identify individual binding sites, we performed DNase-seq experiments on deproteinized naked genomic DNA isolated from two different cell types and determined sequence cleavage bias associated with the DNase-seq protocol. This allowed us to build cleavage bias corrected footprint models specific to individual transcription factors. The predictive performance of these DNase-seq-based binding site models demonstrated that predicted footprints corresponded to high confidence TF-DNA interactions. To quantify the DNase I sequence-dependent cleavage bias, we performed DNase-seq experiments using deproteinized DNA from K562 and MCF7 cell lines.
Project description:Usage of synonymous codons represents a characteristic pattern of preference in each organism. It has been inferred that such bias of codon usage has evolved as a result of adaptation for efficient synthesis of proteins. Here we examined synonymous codon usage in genes of the fission yeast Schizosaccharomyces pombe, and compared codon usage bias with expression levels of the gene. In this organism, synonymous codon usage bias was correlated with expression levels of the gene; the bias was most obvious in two-codon amino acids. A similar pattern of the codon usage bias was also observed in Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans, but was not obvious in Oryza sativa, Drosophila melanogaster, Takifugu rubripes and Homo sapiens. As codons of the highly expressed genes have greater influence on translational efficiency than codons of genes expressed at lower levels, it is likely that codon usage in the S. pombe genome has been optimized by translational selection through evolution.