Project description:The NanoString experiment were conducted on 59 cell lines including 12 ovary cell lines, 12 lung cell lines, 11 colon cell lines, 10 breast cell lines, 4 pancreas cell lines, 2 prostate cell lines, 2 stomach cell lines, and 6 cell lines of 6 other types of tissues. A total of 404 probes were customized to estimate the isoform expressions of 155 cancer genes curated from the literature with more reliable isoform annotations, where each of the 155 gene contains at least two isoforms. RT-qPCR data on seven genes in twelve out of the 59 cancer cell lines are available at https://github.com/compbiolabucf/IntMTQ/tree/master/RT-qPCR
Project description:BackgroundMost eukaryotic genes produce different transcripts of multiple isoforms by inclusion or exclusion of particular exons. The isoforms of a gene often play diverse functional roles, and thus it is necessary to accurately measure isoform expressions as well as gene expressions. While previous studies have demonstrated the strong agreement between mRNA sequencing (RNA-seq) and array-based gene and/or isoform quantification platforms (Microarray gene expression and Exon-array), the more recently developed NanoString platform has not been systematically evaluated and compared, especially in large-scale studies across different cancer domains.ResultsIn this paper, we present a large-scale comparative study among RNA-seq, NanoString, array-based, and RT-qPCR platforms using 46 cancer cell lines across different cancer types. The goal is to understand and evaluate the calibers of the platforms for measuring gene and isoform expressions in cancer studies. We first performed NanoString experiments on 59 cancer cell lines with 404 custom-designed probes for measuring the expressions of 478 isoforms in 155 genes, and additional RT-qPCR experiments for a subset of the measured isoforms in 13 cell lines. We then combined the data with the matched RNA-seq, Exon-array, and Microarray data of 46 of the 59 cell lines for the comparative analysis.ConclusionIn the comparisons of the platforms for measuring the expressions at both isoform and gene levels, we found that (1) the agreement on isoform expressions is lower than the agreement on gene expressions across the four platforms; (2) NanoString and Exon-array are not consistent on isoform quantification even though both techniques are based on hybridization reactions; (3) RT-qPCR experiments are more consistent with RNA-seq and Exon-array than NanoString in isoform quantification; (4) different RNA-seq isoform quantification methods show varying estimation results, and among the methods, Net-RSTQ and eXpress are more consistent across the platforms; and (5) RNA-seq has the best overall consistency with the other platforms on gene expression quantification.
Project description:Background:Alternative splicing and isoform level expression profiling is an emerging field of interest within genomics. Splicing sensitive microarrays, with probes targeted to individual exons or exon-junctions, are becoming increasingly popular as a tool capable of both expression profiling and finer scale isoform detection. Despite their intuitive appeal, relatively little is known about the performance of such tools, particularly in comparison with more traditional 3’ targeted microarrays. Here, we use the well studied Microarray Quality Control (MAQC) dataset to benchmark the Affymetrix Exon Array, and compare it to two other popular platforms: Illumina, and Affymetrix U133. Results:We show that at the gene expression level, the Exon Array performs comparably with the two 3’ targeted platforms. However, the interplatform correlation of the results is slightly lower than between the two 3’ arrays. We show that some of the discrepancies stem from the RNA amplification protocols, e.g. the Exon Array is able to detect expression of non-polyadenylated transcripts. More importantly, we show that many other differences result from the ability of the Exon Array to monitor more detailed isoform-level changes; several examples illustrate that changes detected by the 3’ platforms are actually isoform variations, and that the nature of these variations can be resolved using Exon Array data. Finally, we show how the Exon Array can be used to detect alternative isoform differences, such as alternative splicing, transcript termination, and alternative promoter usage. We discuss the possible pitfalls and false positives resulting from isoform-level analysis. Conclusions:The Exon Array is a valuable tool that can be used to profile gene expression while providing important additional information regarding the types of gene isoforms that are expressed and variable. However, analysis of alternative splicing requires much more hands on effort and visualization of results in order to correctly interpret the data, and generally results in considerably higher false positive rates than expression analysis. One of the main sources of error in the MAQC dataset is variation in amplification efficiency across transcripts, which is not adequately corrected using existing statistical methods. We outline approaches to reduce such errors by filtering out potentially problematic data. Keywords: Compare the Affymetrix Exon Array 1.0 ST to Illumina, and Affymetrix U133 platforms
Project description:Background:Alternative splicing and isoform level expression profiling is an emerging field of interest within genomics. Splicing sensitive microarrays, with probes targeted to individual exons or exon-junctions, are becoming increasingly popular as a tool capable of both expression profiling and finer scale isoform detection. Despite their intuitive appeal, relatively little is known about the performance of such tools, particularly in comparison with more traditional 3’ targeted microarrays. Here, we use the well studied Microarray Quality Control (MAQC) dataset to benchmark the Affymetrix Exon Array, and compare it to two other popular platforms: Illumina, and Affymetrix U133. Results:We show that at the gene expression level, the Exon Array performs comparably with the two 3’ targeted platforms. However, the interplatform correlation of the results is slightly lower than between the two 3’ arrays. We show that some of the discrepancies stem from the RNA amplification protocols, e.g. the Exon Array is able to detect expression of non-polyadenylated transcripts. More importantly, we show that many other differences result from the ability of the Exon Array to monitor more detailed isoform-level changes; several examples illustrate that changes detected by the 3’ platforms are actually isoform variations, and that the nature of these variations can be resolved using Exon Array data. Finally, we show how the Exon Array can be used to detect alternative isoform differences, such as alternative splicing, transcript termination, and alternative promoter usage. We discuss the possible pitfalls and false positives resulting from isoform-level analysis. Conclusions:The Exon Array is a valuable tool that can be used to profile gene expression while providing important additional information regarding the types of gene isoforms that are expressed and variable. However, analysis of alternative splicing requires much more hands on effort and visualization of results in order to correctly interpret the data, and generally results in considerably higher false positive rates than expression analysis. One of the main sources of error in the MAQC dataset is variation in amplification efficiency across transcripts, which is not adequately corrected using existing statistical methods. We outline approaches to reduce such errors by filtering out potentially problematic data. Keywords: Compare the Affymetrix Exon Array 1.0 ST to Illumina, and Affymetrix U133 platforms
Project description:Screening for gene copy-number alterations (CNAs) has improved by applying genome-wide microarrays, where SNP arrays also allow analysis of loss of heterozygozity (LOH). We here analyzed 10 chronic lymphocytic leukemia (CLL) samples using four different high-resolution platforms: BAC arrays (32K), oligonucleotide arrays (185K, Agilent), and two SNP arrays (250K, Affymetrix and 317K, Illumina). Cross-platform comparison revealed 29 concordantly detected CNAs, including known recurrent alterations, which confirmed that all platforms are powerful tools when screening for large aberrations. However, detection of 32 additional regions present in 2-3 platforms illustrated a discrepancy in detection of small CNAs, which often involved reported copy-number variations. LOH analysis revealed concordance of mainly large regions, but showed numerous, small nonoverlapping regions and LOH escaping detection. Evaluation of baseline variation and copy-number ratio response showed the best performance for the Agilent platform and confirmed the robustness of BAC arrays. Accordingly, these platforms demonstrated a higher degree of platform-specific CNAs. The SNP arrays displayed higher technical variation, although this was compensated by high density of elements. Affymetrix detected a higher degree of CNAs compared to Illumina, while the latter showed a lower noise level and higher detection rate in the LOH analysis. Large-scale studies of genomic aberrations are now feasible, but new tools for LOH analysis are requested.
Project description:DNA methylation in CpG context is fundamental to the epigenetic regulation of gene expression in high eukaryotes. Disorganization of methylation status is implicated in many diseases, cellular differentiation, imprinting, and other biological processes. Techniques that enrich for biologically relevant CpG-rich genomic regions are desired since, depending on the size of an oragnism's methylome, the depth of sequencing required to cover all CpGs can be prohibitively expensive. Currently, restriction-enzyme based Reduced Representation Bisulfite Sequencing and its modified protocols are widely used to study methylation differences. Recently, Agilent Technologies and Roche NimbleGen have aimed to both reduce sequencing costs and capture CpGs of known biological relevance by marketing in-solution custom-capture hybridization platforms. These three methods target approximately 10-13% of the human methylome. For each platform - restriction-enzyme based enhanced reduced representation (ERRBS), capture based Agilent SureSelect Methyl-seq (SSMethylseq), and capture based Roche NimbleGen SeqCap Epi CpGiant (CpGiant) - we used human lung fibroblast cell line IMR90 DNA to make libraries according to each protocol and sequenced to equivalent depth. Overall, SSMethylSeq and CpGiant covered >95% of their designed capture regions whereas ERRBS covered 70% of its expected MspI regions. Methylation levels were concordant across the platforms. The concordance of annotations of CpG units for genomic features, displayed roughly the same proportions of genomic features. SSMethylSeq and CpGiant are most similar and cover marginally more annotated regions than ERRBS. However, the number of CpG units shared by all methods was low, ~26% of any platform. We conclude that captured based methods are largely consistent in terms of covered CpG loci although ERRBS provides comparable data at a significantly reduced price. Furthermore, library preparation for ERRBS can be performed with as little as 75ngs of starting material, whereas micrograms are needed for the capture hybridization techniques.
Project description:Background:Alternative splicing and isoform level expression profiling is an emerging field of interest within genomics. Splicing sensitive microarrays, with probes targeted to individual exons or exon-junctions, are becoming increasingly popular as a tool capable of both expression profiling and finer scale isoform detection. Despite their intuitive appeal, relatively little is known about the performance of such tools, particularly in comparison with more traditional 3’ targeted microarrays. Here, we use the well studied Microarray Quality Control (MAQC) dataset to benchmark the Affymetrix Exon Array, and compare it to two other popular platforms: Illumina, and Affymetrix U133. Results:We show that at the gene expression level, the Exon Array performs comparably with the two 3’ targeted platforms. However, the interplatform correlation of the results is slightly lower than between the two 3’ arrays. We show that some of the discrepancies stem from the RNA amplification protocols, e.g. the Exon Array is able to detect expression of non-polyadenylated transcripts. More importantly, we show that many other differences result from the ability of the Exon Array to monitor more detailed isoform-level changes; several examples illustrate that changes detected by the 3’ platforms are actually isoform variations, and that the nature of these variations can be resolved using Exon Array data. Finally, we show how the Exon Array can be used to detect alternative isoform differences, such as alternative splicing, transcript termination, and alternative promoter usage. We discuss the possible pitfalls and false positives resulting from isoform-level analysis. Conclusions:The Exon Array is a valuable tool that can be used to profile gene expression while providing important additional information regarding the types of gene isoforms that are expressed and variable. However, analysis of alternative splicing requires much more hands on effort and visualization of results in order to correctly interpret the data, and generally results in considerably higher false positive rates than expression analysis. One of the main sources of error in the MAQC dataset is variation in amplification efficiency across transcripts, which is not adequately corrected using existing statistical methods. We outline approaches to reduce such errors by filtering out potentially problematic data. This SuperSeries is composed of the SubSeries listed below.
Project description:Screening for gene copy-number alterations (CNAs) has improved by applying genome-wide microarrays, where SNP arrays also allow analysis of loss of heterozygozity (LOH). We here analyzed 10 chronic lymphocytic leukemia (CLL) samples using four different high-resolution platforms: BAC arrays (32K), oligonucleotide arrays (185K, Agilent), and two SNP arrays (250K, Affymetrix and 317K, Illumina). Cross-platform comparison revealed 29 concordantly detected CNAs, including known recurrent alterations, which confirmed that all platforms are powerful tools when screening for large aberrations. However, detection of 32 additional regions present in 2-3 platforms illustrated a discrepancy in detection of small CNAs, which often involved reported copy-number variations. LOH analysis revealed concordance of mainly large regions, but showed numerous, small nonoverlapping regions and LOH escaping detection. Evaluation of baseline variation and copy-number ratio response showed the best performance for the Agilent platform and confirmed the robustness of BAC arrays. Accordingly, these platforms demonstrated a higher degree of platform-specific CNAs. The SNP arrays displayed higher technical variation, although this was compensated by high density of elements. Affymetrix detected a higher degree of CNAs compared to Illumina, while the latter showed a lower noise level and higher detection rate in the LOH analysis. Large-scale studies of genomic aberrations are now feasible, but new tools for LOH analysis are requested. 10 chronic lymphocytic leukemia (CLL) samples was analyzed using four different high-resolution platforms: 32K BAC arrays, 185K Agilent oligonucleotide arrays, 250K Affymetrix SNP arrays and 317K Illumina SNP arrays.
Project description:Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing methods using either short-read (SR) or long-read (LR) RNA sequencing have significant limitations: SR sequencing provides high depth but struggles with isoform deconvolution, while LR sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. Applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of mRNA abundance determinants, reveals the role of untranslated regions (UTRs) in isoform regulation through isoform-specific interactions with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.
Project description:Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing methods using either short-read (SR) or long-read (LR) RNA sequencing have significant limitations: SR sequencing provides high depth but struggles with isoform deconvolution, while LR sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. Applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of mRNA abundance determinants, reveals the role of untranslated regions (UTRs) in isoform regulation through isoform-specific interactions with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.