Project description:Background Many microarray experiments search for genes with differential expression between a common “reference” group and multiple test groups, like in the case of time-course designs or of various treatments versus a control condition. In such cases, currently employed statistical approaches based on t-test or close derivatives have limited efficacy, mostly because estimation of noise is done on only two groups at time. Alternative approaches based on ANOVA correctly capture noise from all the groups, but then do not confront single test groups with the reference. We therefore conceived a statistical test for pairwise comparisons between the reference group and each test group that uses within-group variance calculated from all the groups. Results We implemented an R-Bioconductor package named Mulcom, with a statistical test derived from the Dunnett’s test, designed to compare multiple experimental groups against a common reference. In addition to the basic Dunnett’s t value, the package includes an optional minimal fold-change threshold, m. Thanks to automated, permutation-based estimation of False Discovery Rate (FDR), the package also permits fast optimization of the test, to obtain the maximum number of significant genes at a given FDR value. When applied on a time-course experiment profiled in parallel on two microarray platforms, and compared with currently used tests, Mulcom displayed higher concordance of significant genes in the two array platforms, and higher enrichment in functional annotation to categories related to the biology of the experiment. Conclusions The Mulcom package provides a fast and powerful tool for the identification of differentially expressed genes when several experimental conditions are compared with a common reference. We found that Mulcom leads to lists of differentially expressed genes that are particularly consistent across microarray platforms and enriched in significant classes of genes. In our opinion, the main reasons for these good performances are three: (i) within-group variability is estimated from all experimental groups even if only two of them are compared each time; (ii) the optional fold-change threshold m avoids false positives due to aberrantly low within-group variability; (iii) automated test optimization allows maximizing sensitivity without compromising specificity.

Project description:Background Many microarray experiments search for genes with differential expression between a common “reference” group and multiple test groups, like in the case of time-course designs or of various treatments versus a control condition. In such cases, currently employed statistical approaches based on t-test or close derivatives have limited efficacy, mostly because estimation of noise is done on only two groups at time. Alternative approaches based on ANOVA correctly capture noise from all the groups, but then do not confront single test groups with the reference. We therefore conceived a statistical test for pairwise comparisons between the reference group and each test group that uses within-group variance calculated from all the groups. Results We implemented an R-Bioconductor package named Mulcom, with a statistical test derived from the Dunnett’s test, designed to compare multiple experimental groups against a common reference. In addition to the basic Dunnett’s t value, the package includes an optional minimal fold-change threshold, m. Thanks to automated, permutation-based estimation of False Discovery Rate (FDR), the package also permits fast optimization of the test, to obtain the maximum number of significant genes at a given FDR value. When applied on a time-course experiment profiled in parallel on two microarray platforms, and compared with currently used tests, Mulcom displayed higher concordance of significant genes in the two array platforms, and higher enrichment in functional annotation to categories related to the biology of the experiment. Conclusions The Mulcom package provides a fast and powerful tool for the identification of differentially expressed genes when several experimental conditions are compared with a common reference. We found that Mulcom leads to lists of differentially expressed genes that are particularly consistent across microarray platforms and enriched in significant classes of genes. In our opinion, the main reasons for these good performances are three: (i) within-group variability is estimated from all experimental groups even if only two of them are compared each time; (ii) the optional fold-change threshold m avoids false positives due to aberrantly low within-group variability; (iii) automated test optimization allows maximizing sensitivity without compromising specificity. Ten MDA-MB-435 samples, biological duplicates of each condition (untreated, integrin Beta4 treatment, hepatocyte growth factor treatment for 1 hr, 6 hrs, or 24 hrs).

Project description:This series regroups different datasets (training set, test set, validation set, longitudinal set, separated cell set) to identify and characterise a specific transcriptional signature for patients with active TB, distinct from patients with latent TB and healthy controls. The training set dataset was used to identify a whole blood transcriptional signature for active TB patients in London, across a range of ethnicity. This signature was then validated in an independent cohort of patients, also recruited in London (the test set), and then further confirmed in an additional independent cohort recruited in Cape Town, South Africa (validation set), in order to confirm that the defined signature was present in both high (Cape Town, South Africa) and medium incidence regions (London, UK). The longitudinal dataset was then used to explore how successful TB treatment modifies this transcriptional signature. The separated cell set compares the transcriptional profiles in purified cell subsets (neutrophils, monocytes and T cells) to assess which cell types are contributing to the whole blood signature, and in what way. These studies may ultimately help to improve the diagnosis of active tuberculosis which normally relies on culture of the bacilli, which can take up to 6 weeks, and sometimes the bacilli cannot be obtained from sputum thus requiring invasive techniques such as bronchoalveolar lavage (BAL). In some cases (30%) the bacill cannot be grown from sputum or BAL. Any diagnostic tool would need to be valid across a range of ethnicities, and be valid in both high and low incidence countries. A further aim was to determine whether latent TB patients have a distinct homogeneous or heterogeneous signature, since it is not currently possible to determine using present tests (Tuberculin skin test - TST - or MTb antigen responsiveness of blood cells to produce IFN-gamma - IGRA assay) whether the mycobacteria have been cleared, are still present but are controlled by an active immune response, or to predict which patients will develop active TB. Defining heterogeneity in the latent TB patients would be an important step in developing diagnostics which could detect those most at risk of developing active TB, and thus enable targeted preventive therapy. The latter situation may be determined if Latent patients have a blood transcriptional signature similar to that in Active patients. The transcriptional signature in whole blood and cell subsets from Active TB patients may also provide information as to the factors leading to immunopathogenesis, thus possibly identifying therapeutic targets. The transcriptional profile in latent TB may give information regarding protective factors controlling the infection, important for vaccine development. Finally, definition of a transcriptional signature which responds to therapy could facilitate the development of surrogate biomarkers for drug or vaccine studies. Since any active TB signature may reflect common inflammatory responses evoked during many diseases, we also performed analysis of significance, comparing transcriptional profiles from patients with TB to those from patients with other bacterial and inflammatory diseases to identify a TB specific signature. The resulting signature was then tested against patients normalized to their own controls from 7 independent datasets: TB (Training and Validation Sets), Staphylococcus infection, Group A Streptococcus infection, Still's disease, and adult and pediatric SLE. This SuperSeries is composed of the following subset Series: GSE19435: Transcriptional profiles in Blood of patients with Tuberculosis - Longitudinal Study GSE19439: Blood Transcriptional Profiles in Active and Latent Tuberculosis UK (Training Set) GSE19442: Blood Transcriptional Profiles of TB in South Africa GSE19443: Blood Transcriptional Profiles of Active TB (UK Test Set Separated) GSE19444: Blood Transcriptional Profiles of Active and Latent TB (UK Test Set) GSE22098: Whole blood transcriptional profiles of patients with active tuberculosis (TB) and other inflammatory and infectious diseases Active Pulmonary TB: PTB - All patients were confirmed by isolation of Mycobacterium Tuberculosis on culture of sputum or bronchoalvelolar lavage fluid. Latent TB: LTB - All patients were screened at a tuberculosis clinic, being either new entrants to the UK from endemic countries or being household contacts of infectious cases, or in the case of the validation set recruited in South Africa, were residents of a high incidence country. All UK patients were positive by tuberculin skin test (>14mm if BCG vaccinated, >5mm if not vaccinated) and were also positive by Interferon-Gamma Release assay(IGRA); specifically Quantiferon Gold In-Tube Assay (Cellestis, Australia). The South African latent TB patients were all positive by Interferon-Gamma Release assay (IGRA); specifically Quantiferon Gold In-Tube Assay. Latent patients had no clinical, radiological or microbiological evidence of active infection and were asymptomatic. Healthy controls - these were volunteers without exposure to TB who were negative by both tuberculin skin test (<15mm if BCG vaccinated, <6mm if unvaccinated); and IGRA (as described above). Experimental variables : Patient group: Active PTB; Latent TB, Healthy controls (BCG vaccinated and unvaccinated). Ethnicity - a wide range of ethnic groups is represented. The active PTB group incorporates a range of smear positive and smear negative disease and a spectrum of disease extent/severity. Experimental methods: Whole blood was collected into Tempus tubes (Applied Biosystems, Foster City, CA, USA) and stored between -20degrees Celsius and -80 degrees Celsius before RNA extraction. For the training set cohort, and the active TB patients baseline samples in the longitudinal cohort, total RNA was isolated from whole blood using the PerfectPure RNA Blood kit (5 PRIME Inc, Gaithersburg, MD, USA). For the separated cell samples, total RNA was isolated using the Qiagen RNeasy Mini Kit. For all other cohorts Total RNA was isolated from whole blood using the MagMAX 96 well RNA isolation kit (Applied Biosystems, Foster City, CA, USA). Isolated total RNA was then globin reduced using the GLOBINclear 96-well format kit (Ambion, Austin, TX, USA) according to the manufacturer's instructions. Total and globin-reduced RNA integrity was assessed using an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). Biotinylated, amplified RNA targets (cRNA) were then prepared from the globin-reduced RNA using the Illumina CustomPrep RNA amplification kit (Ambion, Austin, TX, USA). Labeled cRNA was hybridized overnight to Sentrix HT12 V3 BeadChip arrays (>48,000 probes, Illumina Inc, San Diego, CA, USA), washed, blocked, stained and scanned on an Illumina BeadStation 500 following the manufacturer's protocols. Illumina's BeadStudio version 2 software was used to generate signal intensity values from the scans, substract background, and scale each microarray to the median average intensity for all samples (per-chip normalisation). This normalised data was used for all subsequent data analysis.

Dataset Information

Mulcom: a multiple comparison statistical test for microarray data in Bioconductor (Illumina)

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets