Estimating accuracy of absolute gene expression measurement by RNA-Seq and microarrays with proteomics
Ontology highlight
ABSTRACT: Microarrays revolutionized biological research by enabling gene expression comparisons on a transcriptome-wide scale. Microarrays, however, do not estimate absolute expression level accurately. At present, high throughput sequencing is emerging as an alternative methodology for transcriptome studies. Although free of many limitations imposed by microarray design, its potential to estimate absolute transcript levels is unknown. In this study, we evaluate relative accuracy of microarrays and transcriptome sequencing (RNA-Seq) using third methodology: proteomics. We find that RNA-Seq provides a better estimate of absolute expression levels.
Project description:Microarrays revolutionized biological research by enabling gene expression comparisons on a transcriptome-wide scale. Microarrays, however, do not estimate absolute expression level accurately. At present, high throughput sequencing is emerging as an alternative methodology for transcriptome studies. Although free of many limitations imposed by microarray design, its potential to estimate absolute transcript levels is unknown. In this study, we evaluate relative accuracy of microarrays and transcriptome sequencing (RNA-Seq) using third methodology: proteomics. We find that RNA-Seq provides a better estimate of absolute expression levels. We first determined whether we could reproduce the agreement between mRNA expression estimates measured by microarrays and by RNA-Seq reported in other studies. For this purpose, we collected mRNA expression data in two independent cerebellar samples, each containing pooled mRNA from 5 adult human individuals, using both methodologies. Next, to test whether biological variation among samples would substantially reduce correlation strength, we compared expression levels determined by RNA-Seq in two pooled samples to the microarray data obtained from different individuals. For this purpose we used expression measurements obtained using Affymetrix Exon Arrays in 5 individual adult human cerebellar samples, none of which were included in the two pooled samples. Further, since technical and stochastic variation are extremely unlikely to result in better correlation between mRNA and protein expression measurements, we argue that the technology resulting in better correlation must provide more accurate measurements.
Project description:Digestion with restriction enzymes is a classical approach for probing DNA accessibility in chromatin. It allows to monitor both the cut and the uncut fraction and thereby the determination of accessibility or occupancy (= 1- accessibility) in absolute terms as the percentage of cut or uncut out of the total molecules. The here presented protocol takes this classical approach to the genome-wide level. After exhaustive restriction enzyme digestion of chromatin, DNA is purified, sheared and converted into libraries for high throughput sequencing. Bioinformatic analysis counts DNA fragments cut by the restriction enzyme as well as DNA ends generated by restriction enzyme digest and derives thereof the fraction of accessible DNA. This straight forward principle is technically challenged as preparation and sequencing of the libraries leads to biased scoring of DNA fragments with ends generated by restriction enzymes versus by shearing. Our protocol includes two orthogonal approaches to correct for this bias, the “corrected cut-uncut” and the “cut-all cut” method, so that accurate measurements of absolute accessibility/occupancy at restriction sites throughout a genome are possible. The protocol is presented for the example of S. cerevisiae chromatin but may be adapted for any other species.
Project description:We developed a method to estimate the 3D interaction probabilities of chromatin loops across the genome on an absolute scale from Micro-C maps. To calibrate the method, we performed Micro-C on two engineered mouse embryonic stem cell (mESC) lines, each containing a fluorescently labeled chromatin loop that was quantified in previous live imaging studies. One loop is an endogenous loop containing the Fbn2 gene, and the other is a synthetic loop near the Npr3 gene. We performed two replicates of Micro-C per cell line. Using our absolute quantification method, we find that loops generally form with low probabilities. We also provide an ultra-deep merged Micro-C map for mESCs that combines all existing mESC Micro-C datasets to date, containing a total of 15.6 billion unique interactions.
Project description:Background. Although the emergence of RNA sequencing (RNA-seq), microarrays remain in widespread use for gene expression analysis in the clinic. There are over 767,000 RNA microarrays from human samples in public repositories, which are an invaluable resource for biomedical research and personalized medicine. The absolute gene expression analysis allows the transcriptome profiling of all expressed genes under the specific biological condition without the need of a reference sample. However, the background fluorescence represents a challenge to determine the absolute gene expression in microarrays. Given that the Y chromosome is absent in female subjects, we used it as a new approach for absolute gene expression analysis in which the fluorescence of the Y chromosome genes of female subjects was used as the background fluorescence for all the probes in the microarray. This fluorescence was used to establish an absolute gene expression threshold, allowing the differentiation between expressed and non-expressed genes in microarrays. Methods. We extracted the RNA from 16 children leukocyte samples (9 males and 7 females, ages 6 to 10 years). An Affymetrix Gene Chip Human Gene 1.0 ST Array was carried out for each sample and the fluorescence of the 124 genes of the Y chromosome was used to calculate the absolute gene expression threshold. After that, several expressed and non-expressed genes according to our absolute gene expression threshold were compared against the expression obtained using real-time quantitative polymerase chain reaction (RT-qPCR). Results. From the 124 genes of the Y chromosome, three genes (DDX3Y, TXLNG2P and EIF1AY) that displayed significant differences between sexes were used to calculate the absolute gene expression threshold. Using this threshold, we selected 13 expressed and non-expressed genes and their expression level were confirmed by RT-qPCR. Then, we selected the top 5% most expressed genes and found that several KEEG pathways were significantly enriched. Interestingly, these pathways were related to the typical functions of leukocytes cells, such as antigen processing and presentation and natural killer cell-mediated cytotoxicity. We also applied this method to obtain the absolute gene expression threshold in already published microarray data of liver cells, where the top 5% expressed genes showed an enrichment of typical KEGG pathways for liver cells. Our results suggest that the three selected genes of the Y chromosome can be used to calculate an absolute gene expression threshold, allowing a transcriptome profiling of microarray data without the need of an additional reference experiment. Discussion. Our approach based on the establishment of a threshold for absolute gene expression analysis will allow a new way to analyze thousands of microarrays from public databases. This allows the study of different human diseases without the need of having additional samples for relative expression experiments.
Project description:BACKGROUND: Microarrays revolutionized biological research by enabling gene expression comparisons on a transcriptome-wide scale. Microarrays, however, do not estimate absolute expression level accurately. At present, high throughput sequencing is emerging as an alternative methodology for transcriptome studies. Although free of many limitations imposed by microarray design, its potential to estimate absolute transcript levels is unknown. RESULTS: In this study, we evaluate relative accuracy of microarrays and transcriptome sequencing (RNA-Seq) using third methodology: proteomics. We find that RNA-Seq provides a better estimate of absolute expression levels. CONCLUSION: Our result shows that in terms of overall technical performance, RNA-Seq is the technique of choice for studies that require accurate estimation of absolute transcript levels.
Project description:Computing absolute protein abundances using mass spectrometry (MS) is a widely used technique in quantitative biology. An important and often overlooked aspect in this methodology is to assess technical reproducibility, i.e. how precise are predictions of abundance when we use the instrument on repeated occasions to measure the same sample. Here, we present a proteomics dataset of Saccharomyces cerevisiae with both biological and inter-run technical triplicates, which we use to analyze both accuracy and precision of the MS instrument. We also investigate how we can improve the quality of predictions by using 4 alternative methods for estimating absolute protein abundance starting from MS intensities. We show that using a simple normalization and rescaling approach performs equally accurate, but much more precise, than methods that rely on external standards. Furthermore, we show that technical reproducibility is significantly lower than biological reproducibility for all the evaluated methods. The results presented here serve as a benchmark for assessing the best way of interpreting MS results to compute protein abundances, and as a consideration of the limitations of the technique when interpreting results.
Project description:MicroRNAs (miRNAs) have been shown to play an important role in many different cellular, developmental, and physiological processes. Accordingly, numerous methods have been established to identify and quantify miRNAs. The shortness of miRNA sequence results in a high dynamic range of melting temperatures and, moreover, impedes a proper selection of detection probes or optimized PCR primers. While miRNA microarrays allow for massive parallel and accurate relative measurement of all known miRNAs, they have so far been less useful as an assay for absolute quantification. Here, we present a microarray based approach for global and absolute quantification of miRNAs. The method relies on an equimolar pool of about 1000 synthetic miRNAs of known concentration which is used as an universal reference and labeled and hybridized in a dual colour approach on the same array as the sample of interest. Each single miRNA is quantified with respect to the universal reference outbalancing bias related to sequence, labeling, hybridization or signal detection method. We demonstrate the accuracy of the method by various spike in experiments. Further, we quantified miRNA copy numbers in liver samples and CD34(+)CD133(-) hematopoietic stem cells.