Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

MAQC

ABSTRACT: The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and analysis issues. We demonstrate the consistency of results within a platform across test sites as well as the high level of cross-platform concordance in terms of genes identified as differentially expressed. The MAQC study provides a rich resource that will help build consensus on the use of microarrays in research, clinical and regulatory settings. Manuscripts related to the MAQC project have been published in Nature Biotechnology, 24(9), September, 2006. More information about the MAQC project can be found at http://edkb.fda.gov/MAQC/.

Expression data from two distinct reference RNA samples (A and B) in four titration pools were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Sample A = Stratagene Universal Human Reference RNA (UHRR, Catalog #740000), Sample B = Ambion Human Brain Reference RNA (HBRR, Catalog #6050), Sample C = Samples A and B mixed at 75%:25% ratio (A:B); and Sample D = Samples A and B mixed at 25%:75% ratio (A:B). In general, each microarray platform was tested at three sites and each sample was tested in five replicates at each test site. Samples (hybridizations) were named according to the following convention: Platform_Testsite_SampleRelicate. For example, AFX_2_B1 represents the hybridization (array) from platform AFX processed by test site 2 for the first replicate of sample B. Assignment of platform code: ABI = Applied Biosystems (microarray); AFX = Affymetrix; AG1 = Agilent one-color; AGL = Agilent two-color; GEH = GE Healthcare; ILM = Illumina; NCI = NCI two-color (Operon oligos); EPP = Eppendorf; TAQ = TaqMan (Applied Biosystems); QGN = QuantiGene (Panomics); GEX = StaRT-PCR (Gene Express); H25K = TeleChem two-color; H25K1 = TeleChem one-color; BIO = CapitalBio two-color (Operon oligos); BIO1 = CapitalBio one-color (Operon oligos); OPN = Operon two-color (Operon oligos); NMC = Norwegian Microarray Consortium two-color (Operon oligos).

ORGANISM(S): Homo sapiens

SUBMITTER: Leming Shi

PROVIDER: E-TABM-132 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Publications

The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.

Shi Leming L Reid Laura H LH Jones Wendell D WD Shippy Richard R Warrington Janet A JA Baker Shawn C SC Collins Patrick J PJ de Longueville Francoise F Kawasaki Ernest S ES Lee Kathleen Y KY Luo Yuling Y Sun Yongming Andrew YA Willey James C JC Setterquist Robert A RA Fischer Gavin M GM Tong Weida W Dragan Yvonne P YP Dix David J DJ Frueh Felix W FW Goodsaid Frederico M FM Herman Damir D Jensen Roderick V RV Johnson Charles D CD Lobenhofer Edward K EK Puri Raj K RK Schrf Uwe U Thierry-Mieg Jean J Wang Charles C Wilson Mike M Wolber Paul K PK Zhang Lu L Amur Shashi S Bao Wenjun W Barbacioru Catalin C CC Lucas Anne Bergstrom AB Bertholet Vincent V Boysen Cecilie C Bromley Bud B Brown Donna D Brunner Alan A Canales Roger R Cao Xiaoxi Megan XM Cebula Thomas A TA Chen James J JJ Cheng Jing J Chu Tzu-Ming TM Chudin Eugene E Corson John J Corton J Christopher JC Croner Lisa J LJ Davies Christopher C Davison Timothy S TS Delenstarr Glenda G Deng Xutao X Dorris David D Eklund Aron C AC Fan Xiao-hui XH Fang Hong H Fulmer-Smentek Stephanie S Fuscoe James C JC Gallagher Kathryn K Ge Weigong W Guo Lei L Guo Xu X Hager Janet J Haje Paul K PK Han Jing J Han Tao T Harbottle Heather C HC Harris Stephen C SC Hatchwell Eli E Hauser Craig A CA Hester Susan S Hong Huixiao H Hurban Patrick P Jackson Scott A SA Ji Hanlee H Knight Charles R CR Kuo Winston P WP LeClerc J Eugene JE Levy Shawn S Li Quan-Zhen QZ Liu Chunmei C Liu Ying Y Lombardi Michael J MJ Ma Yunqing Y Magnuson Scott R SR Maqsodi Botoul B McDaniel Tim T Mei Nan N Myklebost Ola O Ning Baitang B Novoradovskaya Natalia N Orr Michael S MS Osborn Terry W TW Papallo Adam A Patterson Tucker A TA Perkins Roger G RG Peters Elizabeth H EH Peterson Ron R Philips Kenneth L KL Pine P Scott PS Pusztai Lajos L Qian Feng F Ren Hongzu H Rosen Mitch M Rosenzweig Barry A BA Samaha Raymond R RR Schena Mark M Schroth Gary P GP Shchegrova Svetlana S Smith Dave D DD Staedtler Frank F Su Zhenqiang Z Sun Hongmei H Szallasi Zoltan Z Tezak Zivana Z Thierry-Mieg Danielle D Thompson Karol L KL Tikhonova Irina I Turpaz Yaron Y Vallanat Beena B Van Christophe C Walker Stephen J SJ Wang Sue Jane SJ Wang Yonghong Y Wolfinger Russ R Wong Alex A Wu Jie J Xiao Chunlin C Xie Qian Q Xu Jun J Yang Wen W Zhang Liang L Zhong Sheng S Zong Yaping Y Slikker William W

Nature biotechnology 20060901 9

Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data o ...[more]

PMID: 16964229

Similar Datasets

Project description:The MAQC-II Project: A comprehensive study of common practices for the development and validation of microarray-based predictive models The second phase of the MicroArray Quality Control (MAQC-II) project evaluated common practices for developing and validating microarray-based models aimed at predicting toxicological and clinical endpoints. The purposes of the MAQC-II project were to survey approaches in genomic model development in an attempt to understand sources of variability in prediction performance, and to assess the influences of endpoint signal strength in data. Thirty-six teams developed classifiers for 13 diverse endpoints -- some easy, some difficult to predict, from six relatively large training data sets -- three preclinical (toxicogenomics) and three clinical. By providing the same data sets to many organizations for analysis, but not restricting their data analysis protocols (DAPs), the project made it possible to evaluate to what extent, if any, results depend on the team that performs the analysis. These analyses collectively produced >18,000 models that were challenged by independent and blinded validation sets generated for MAQC-II. The cross-validated performance estimates for models developed under good practices are predictive of the blinded validation performance. The achievable prediction performance is largely determined by the intrinsic predictability of the endpoint, and simple data analysis methods often perform as well as more complicated approaches. Multiple models of comparable performance can be developed for a given endpoint and the stability of gene lists correlates with endpoint predictability. Importantly, similar conclusions were reached when >12,000 new models were generated by swapping the original training and validation sets. Description of six data sets including 13 prediction endpoints: (Summarized in GSE16716_MAQC-II_Datasets_Overview.pdf attached as supplementary file. For more details, see the MAQC-II main paper and its references for individual dataset.) The MAQC-II predictive modeling was limited to binary classification problems; therefore, continuous endpoint values such as overall survival (OS) and event-free survival (EFS) times were dichotomized using a "milestone" cutoff of censor data. Prediction endpoints were chosen to span a wide range of prediction difficulty. Two endpoints, H (CPS1) and L (NEP_S), representing the gender of the patients, were used as positive control endpoints, since they are easily predictable by microarrays. Two other endpoints, I (CPS1) and M (NEP_R), representing randomly assigned class labels, were designed to serve as negative control endpoints, since they are not supposed to be predictable. Data analysis teams were not aware of the characteristics of endpoints H, I, L, and M until their swap prediction results had been submitted. If a data analysis protocol did not yield models to accurately predict endpoints H and L, or if a data analysis protocol claims to be able to yield models to accurately predict endpoints I and M, something must have gone wrong. The Hamner data set (endpoint A) was provided by The Hamner Institutes for Health Sciences (Research Triangle Park, NC, USA). The study objective was to apply microarray gene expression data from the lung of female B6C3F1 mice exposed to a 13-week treatment of chemicals to predict increased lung tumor incidence in the 2-year rodent cancer bioassays of the National Toxicology Program. If successful, the results may form the basis of a more efficient and economical approach for evaluating the carcinogenic activity of chemicals. Microarray analysis was performed using Affymetrix Mouse Genome 430 2.0 arrays on three to four mice per treatment group, and a total of 70 mice were analyzed and used as the MAQC-II's training set. Additional data from another set of 88 mice were collected later and provided as the MAQC-II's external validation set. The Iconix data set (endpoint B) was provided by Iconix Biosciences, Inc. (Mountain View, CA, USA). The study objective was to assess, upon short term exposure, hepatic tumor induction by non-genotoxic chemicals, since there are currently no accurate and well-validated short-term tests to identify non-genotoxic hepatic tumorigens, thus necessitating an expensive 2-year rodent bioassay before a risk assessment can begin. The training set consists of hepatic gene expression data from 216 male Sprague-Dawley rats treated for 5 days with one of 76 structurally and mechanistically diverse nongenotoxic hepatocarcinogens and non-hepatocarcinogens. The validation set consists of 201 male Sprague-Dawley rats treated for 5 days with one of 68 structurally and mechanistically diverse non-genotoxic hepatocarcinogens and non- hepatocarcinogens. Gene expression data were generated using the Amersham Codelink Uniset Rat 1 Bioarray (GE HealthCare, Piscataway, NJ). The separation of the training set and validation set was based on the time when the microarray data were collected; i.e., microarrays processed earlier in the study were used as training and those processed later were used as validation. The NIEHS data set (endpoint C) was provided by the National Institute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (Research Triangle Park, NC, USA). The study objective was to use microarray gene expression data acquired from the liver of rats exposed to hepatotoxicants to build classifiers for prediction of liver necrosis. The gene expression "compendium" data set was collected from 418 rats exposed to one of eight compounds (1,2- dichlorobenzene, 1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine, thioacetamide, galactosamine, and diquat dibromide). All eight compounds were studied using standardized procedures, i.e. a common array platform (Affymetrix Rat 230 2.0 microarray), experimental procedures and data retrieving and analysis processes. Briefly, for each compound, four to six male, 12 week old F344 rats were exposed to a low dose, mid dose(s) and a high dose of the toxicant and sacrificed at 6, 24 and 48 hrs later. At necropsy, liver was harvested for RNA extraction, histopathology, and clinical chemistry assessments. The human breast cancer (BR) data set (endpoints D and E) was contributed by the University of Texas M. D. Anderson Cancer Center (MDACC, Houston, TX, USA). Gene expression data from 230 stage I-III breast cancers were generated from fine needle aspiration specimens of newly diagnosed breast cancers before any therapy. The biopsy specimens were collected sequentially during a prospective pharmacogenomic marker discovery study between 2000 and 2008. These specimens represent 70-90% pure neoplastic cells with minimal stromal contamination. Patients received 6 months of preoperative (neoadjuvant) chemotherapy including paclitaxel, 5-fluorouracil, cyclophosphamide and doxorubicin followed by surgical resection of the cancer. Response to preoperative chemotherapy was categorized as a pathological complete response (pCR = no residual invasive cancer in the breast or lymph nodes) or residual invasive cancer (RD), and used as endpoint D for prediction. Endpoint E is the clinical estrogen-receptor status as established by immunohistochemistry. RNA extraction and gene expression profiling were performed in multiple batches over time using Affymetrix U133A microarrays. Genomic analysis of a subset of this sequentially accrued patient population were reported previously. For each endpoint, the first 130 cases were used as a training set and the next 100 cases were used as an independent validation set. The multiple myeloma (MM) data set (endpoints F, G, H, and I) was contributed by the Myeloma Institute for Research and Therapy at the University of Arkansas for Medical Sciences (UAMS, Little Rock, AR, USA). Gene expression profiling of highly purified bone marrow plasma cells was performed in newly diagnosed patients with MM. The training set consisted of 340 cases enrolled on total therapy 2 (TT2) and the validation set comprised 214 patients enrolled in total therapy 3 (TT3). Plasma cells were enriched by anti-CD138 immunomagnetic bead selection of mononuclear cell fractions of bone marrow aspirates in a central laboratory. All samples applied to the microarray contained more than 85% plasma cells as determined by 2-color flow cytometry (CD38+ and CD45-/dim) performed after selection. Dichotomized overall survival (OS) and eventfree survival (EFS) were determined based on a two-year milestone cutoff. A gene expression model of high-risk multiple myeloma was developed and validated by the data provider and later on validated in three additional independent data sets. The neuroblastoma (NB) data set (endpoints J, K, L, and M) was contributed by the Children's Hospital of the University of Cologne, Germany. Tumor samples were checked by a pathologist prior to RNA isolation; only samples with =60% tumor content were utilized and total RNA was isolated from ~50mg of snap-frozen neuroblastoma tissue obtained before chemotherapeutic treatment. First, 502 pre-existing 11K Agilent dye-flipped, dual-color replicate profiles for 251 patients were provided. Of these, profiles of 246 neuroblastoma samples passed an independent MAQC-II quality assessment by majority decision and formed the MAQC-II training data set. Subsequently, 514 dyeflipped dual-color 11K replicate profiles for 256 independent neuroblastoma tumor samples were generated and profiles for 253 samples were selected to form the MAQC-II validation set. Of note, for one patient of the validation set, two different tumor samples were analyzed utilizing both versions of the 2x11K microarray (see below). All dual-color gene-expression of the MAQC-II training set were generated using a customized 2x11K neuroblastoma-related microarray. Furthermore, 20 patients of the MAQC-II validation set were also profiled utilizing this microarray. Dual-color profiles of the remaining patients of the MAQC-II validation set were performed using a slightly revised version of the 2x11K microarray. This version V2.0 of the array comprised 200 novel oligonucleotide probes whereas 100 oligonucleotide probes of the original design were removed due to consistent low expression values (near background) observed in the training set profiles. These minor modifications of the microarray design resulted in a total of 9,986 probes present on both versions of the 2x11K microarray. The experimental protocol did not differ between both sets and gene-expression profiles were performed as described. Furthermore, single-color geneexpression profiles were generated for 478/499 neuroblastoma samples of the MAQC-II dual-color training and validation sets (training set 244/246; validation set 234/253). For the remaining 21 samples no single-color data were available, due to either shortage of tumor material of these patients (n=15), poor experimental quality of the generated single-color profiles (n=5), or correlation of one single-color profile to two different dual-color profiles for the one patient profiled with both versions of the 2x11K microarrays (n=1). Single-color gene-expression profiles were generated using customized 4x44K oligonucleotide microarrays produced by Agilent Technologies (Palo Alto, CA, USA). These 4x44K microarrays included all probes represented by Agilent's Whole Human Genome Oligo Microarray and all probes of the version V2.0 of the 2x11K customized microarray that were not present in the former probe set. Labeling and hybridization was performed following the manufacturer's protocol as described. This SuperSeries is composed of the SubSeries listed below.

Dataset Information

MAQC

Publications

The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets