Project description:This series includes a 32-array training dataset used to evaluate E-Predict normalization and similarity metric parameters as well as 13 microarrays used as examples in (Urisman, et. al 2005). Training data set includes 15 independent HeLa RNAhybridizations (microarrays 1-15), 10 independent nasal lavage samples positive for Respiratory Syncytial virus (microarrays 16-25), and 7 independent nasal lavage samples positive for Influenza A virus (microarrays 26-32). Examples iclude a serum sample positive for Hepatitis B virus (microarray 33), a nasal lavage sample positive for both Influenza A virus and Respiratory Syncytial virus (microarray 34), and culture samples of 11 distinct Human Rhinovirus serotypes (microarrays 35-45). Keywords = virus detection, E-Predict, species identification, metagenomics
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:This series includes a 32-array training dataset used to evaluate E-Predict normalization and similarity metric parameters as well as 13 microarrays used as examples in (Urisman, et. al 2005). Training data set includes 15 independent HeLa RNAhybridizations (microarrays 1-15), 10 independent nasal lavage samples positive for Respiratory Syncytial virus (microarrays 16-25), and 7 independent nasal lavage samples positive for Influenza A virus (microarrays 26-32). Examples iclude a serum sample positive for Hepatitis B virus (microarray 33), a nasal lavage sample positive for both Influenza A virus and Respiratory Syncytial virus (microarray 34), and culture samples of 11 distinct Human Rhinovirus serotypes (microarrays 35-45). Keywords = virus detection, E-Predict, species identification, metagenomics Keywords: other
Project description:The Hamner data set (endpoint A) was provided by The Hamner Institutes for Health Sciences (Research Triangle Park, NC, USA). The study objective was to apply microarray gene expression data from the lung of female B6C3F1 mice exposed to a 13-week treatment of chemicals to predict increased lung tumor incidence in the 2-year rodent cancer bioassays of the National Toxicology Program. If successful, the results may form the basis of a more efficient and economical approach for evaluating the carcinogenic activity of chemicals. Microarray analysis was performed using Affymetrix Mouse Genome 430 2.0 arrays on three to four mice per treatment group, and a total of 70 mice were analyzed and used as the MAQC-II's training set (GEO Series GSE6116). Additional data from another set of 88 mice were collected later and provided as the MAQC-II's external validation set (this Series). The training dataset had already been deposited in GEO by its provider and its accession number is GSE6116. 88 Samples
Project description:The Hamner data set (endpoint A) was provided by The Hamner Institutes for Health Sciences (Research Triangle Park, NC, USA). The study objective was to apply microarray gene expression data from the lung of female B6C3F1 mice exposed to a 13-week treatment of chemicals to predict increased lung tumor incidence in the 2-year rodent cancer bioassays of the National Toxicology Program. If successful, the results may form the basis of a more efficient and economical approach for evaluating the carcinogenic activity of chemicals. Microarray analysis was performed using Affymetrix Mouse Genome 430 2.0 arrays on three to four mice per treatment group, and a total of 70 mice were analyzed and used as the MAQC-II's training set (GEO Series GSE6116). Additional data from another set of 88 mice were collected later and provided as the MAQC-II's external validation set (this Series). The training dataset had already been deposited in GEO by its provider and its accession number is GSE6116.
Project description:Esophageal cancer is a highly malignant and prevalent cancer worldwide. Current TNM staging system is insufficient for prognosis of esophagus squamous cell carcinoma (ESCC) patients. The aim of this study is to evaluate miRNA expression profile of ESCC and identify a miRNA signature which robustly predict the survival of ESCC patients. MiRNA expression profiles of paired frozen tissues from 119 ESCC patients were assessed by microarray. After normalization of microarray data, the patients were randomly divided into a training set (n=60) and a test set (n=59). From the training set, we identified a four-miRNA prognostic signature (including hsa-miR-218-5p, hsa-miR-142-3p, hsa-miR-150-5p, and hsa-miR-205-5p) using random forest supervised classification algorithm and nearest shrunken centroid algorithm. This signature distinguished the patients into high-risk or low-risk groups whose overall survival differed significantly (5-year survival 7.4% vs. 66.7%, p<0.001). Prognostic value of this signature was validated in the test set (5-year survival 18.8% vs. 46.5%, p=0.025) and further in an independent cohort of 58 patients assessed by a different platform (5-year survival 11.4% vs. 56.7%, p=0.003). Furthermore, multivariable Cox regression analysis revealed that this signature is an independent prognostic factor for ESCC patients. Moreover, stratified analysis showed that this signature was able to predict survival within TNM stages. The expression level of the four miRNAs measured by microarray was verified by qRT-PCR and showed high level of positive correlation (Pearson correlation coefficient>0.75, p<0.001 for all). Our results suggest that the four-miRNA signature can serve as a reliable biomarker to predict the survival of ESCC patients.
Project description:Chronic lymphocytic leukemia (CLL) is a heterogeneous malignancy, characterized by a variable clinical course. While clinical and laboratory parameters are increasingly being used to refine prognosis, they do not accurately predict response to commonly used therapy. We used gene expression profiling to generate and further refine prognostic and predictive markers. Genomic signatures that reflect progressive disease and responses to chemotherapy or chemo-immunotherapy were created using cancer cell lines and patient leukemia samples. We validated these signatures using independent clinical data from four separate cohorts representing a total of 301 CLL patients. A prognostic genomic signature created from patient leukemic cell gene expression data coupled with clinical parameters could statistically differentiate patients with stable or progressive disease in the training dataset. The progression signature was then validated in two independent datasets, demonstrating a capacity to accurately identify patients at risk for progressive disease. In addition, two distinct genomic signatures that predict response to chlorambucil or pentostatin, cyclophosphamide, and rituximab were also generated and were shown to accurately distinguish responding and non-responding CLL patients. Microarray analysis of CLL patientsâ lymphocytes can be used to refine prognosis and predict response to different therapies. These results have direct implications for standard and investigational therapeutics in CLL patients. Experiment Overall Design: For the predictive genomic signature or response to pentostatin, cyclophosphamide, and rituximab, 20 CLL leukemia samples were used in the training set, and 20 CLL leukemia samples were used in the validation set
Project description:Samples were prospectively collected from patients with histologically normal surgical resection margins. 96 tissue samples (histologically normal margins, oral carcinoma and adjacent normal tissues) from 24 patients comprised the training set. Our study design was guided by the hypothesis that the expression of genes present in oral squamous cell carcinoma (OSCC) but not in healthy oral tissues would be indicative of recurrence in advance of histological alteration. We used meta-analysis of five published microarray data sets (GEO accession GDS2520, Kuriakose et al. 2004; GDS1584, Toruner et al. 2004; GSE6791, Pyeon et al. 2007; GSE9844, Ye et al. 2008; and GSE10121, Sticht et al. 2008), in conjunction with the current training set, to identify genes reliably over-expressed in OSCC. This reduced gene set was used to train a risk model to predict recurrence based on over-expression of a subset of these genes in histologically normal surgical resection margins. Validation of the risk signature was performed using quantitative real-time reverse-transcription PCR in an independent set of 136 samples from an independent cohort of 30 patients. This was a case-only design involving a training set of 23 tumors and 73 margins from 24 patients with squamous cell carcinoma of the tongue.
Project description:Cytogenetic abnormalities (CA) are important clinical parameters in various types of cancer, including multiple myeloma (MM). We developed a model to predict CA in patients with MM using gene expression profiling (GEP) and validated it by different cytogenetic techniques. The model was shown to have an accuracy up to 0.89. These results provide proof of concept for the hypothesis that GEP could serve as a one-stop data source for clinical molecular diagnosis and/or prognosis. 92 paired RNA-DNA samples were hybridized to Affy U133Plus2 and Agilent 244K aCGH arrays and used as training set. Another 23 paired samples as test set.