Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequence Quality Control consortium

ABSTRACT: We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for sequence discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcriptlevel profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. The well-characterized reference RNA samples A (pooled cell lines) and B (human brain) from the MAQC consortium, adding spike-ins of synthetic RNA from the External RNA Control Consortium (ERCC). Samples C and D were then constructed by combining A and B in known mixing ratios, 3:1 and 1:3, respectively. All samples were distributed to several independent sites for RNA-Seq library construction and profiling by Illumina HiSeq 2000 and LifeTech SOLiD 5500 platforms. Also, vendors created their own cDNA libraries that were then distributed to each test site, in order to examine the degree of a M-bM-^@M-^\site effectM-bM-^@M-^] that was independent of the library preparation process. To support an assessment of gene models, samples A and B were also sequenced at independent sites by the Roche 454 platform, providing longer reads. For comparison to other technologies, these data were also compared to the MAQC-I Affymetrix U133 Plus2 microarray, several current microarray platforms, and also assessed by 20,801 PrimePCR reactions.

ORGANISM(S): Homo sapiens

SUBMITTER: Leming Shi

PROVIDER: E-GEOD-56457 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Similar Datasets

Project description:The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and analysis issues. We demonstrate the consistency of results within a platform across test sites as well as the high level of cross-platform concordance in terms of genes identified as differentially expressed. The MAQC study provides a rich resource that will help build consensus on the use of microarrays in research, clinical and regulatory settings. Manuscripts related to the MAQC project have been published in Nature Biotechnology, 24(9), September, 2006. More information about the MAQC project can be found at http://edkb.fda.gov/MAQC/.<br><br>Expression data from two distinct reference RNA samples (A and B) in four titration pools were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Sample A = Stratagene Universal Human Reference RNA (UHRR, Catalog #740000), Sample B = Ambion Human Brain Reference RNA (HBRR, Catalog #6050), Sample C = Samples A and B mixed at 75%:25% ratio (A:B); and Sample D = Samples A and B mixed at 25%:75% ratio (A:B). In general, each microarray platform was tested at three sites and each sample was tested in five replicates at each test site. Samples (hybridizations) were named according to the following convention: Platform_Testsite_SampleRelicate. For example, AFX_2_B1 represents the hybridization (array) from platform AFX processed by test site 2 for the first replicate of sample B. Assignment of platform code: ABI = Applied Biosystems (microarray); AFX = Affymetrix; AG1 = Agilent one-color; AGL = Agilent two-color; GEH = GE Healthcare; ILM = Illumina; NCI = NCI two-color (Operon oligos); EPP = Eppendorf; TAQ = TaqMan (Applied Biosystems); QGN = QuantiGene (Panomics); GEX = StaRT-PCR (Gene Express); H25K = TeleChem two-color; H25K1 = TeleChem one-color; BIO = CapitalBio two-color (Operon oligos); BIO1 = CapitalBio one-color (Operon oligos); OPN = Operon two-color (Operon oligos); NMC = Norwegian Microarray Consortium two-color (Operon oligos).

Dataset Information

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequence Quality Control consortium

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets