ABSTRACT: This series regroups different datasets (training set, test set, validation set, longitudinal set, separated cell set) to identify and characterise a specific transcriptional signature for patients with active TB, distinct from patients with latent TB and healthy controls. The training set dataset was used to identify a whole blood transcriptional signature for active TB patients in London, across a range of ethnicity. This signature was then validated in an independent cohort of patients, also recruited in London (the test set), and then further confirmed in an additional independent cohort recruited in Cape Town, South Africa (validation set), in order to confirm that the defined signature was present in both high (Cape Town, South Africa) and medium incidence regions (London, UK). The longitudinal dataset was then used to explore how successful TB treatment modifies this transcriptional signature. The separated cell set compares the transcriptional profiles in purified cell subsets (neutrophils, monocytes and T cells) to assess which cell types are contributing to the whole blood signature, and in what way. These studies may ultimately help to improve the diagnosis of active tuberculosis which normally relies on culture of the bacilli, which can take up to 6 weeks, and sometimes the bacilli cannot be obtained from sputum thus requiring invasive techniques such as bronchoalveolar lavage (BAL). In some cases (30%) the bacill cannot be grown from sputum or BAL. Any diagnostic tool would need to be valid across a range of ethnicities, and be valid in both high and low incidence countries. A further aim was to determine whether latent TB patients have a distinct homogeneous or heterogeneous signature, since it is not currently possible to determine using present tests (Tuberculin skin test - TST - or MTb antigen responsiveness of blood cells to produce IFN-gamma - IGRA assay) whether the mycobacteria have been cleared, are still present but are controlled by an active immune response, or to predict which patients will develop active TB. Defining heterogeneity in the latent TB patients would be an important step in developing diagnostics which could detect those most at risk of developing active TB, and thus enable targeted preventive therapy. The latter situation may be determined if Latent patients have a blood transcriptional signature similar to that in Active patients. The transcriptional signature in whole blood and cell subsets from Active TB patients may also provide information as to the factors leading to immunopathogenesis, thus possibly identifying therapeutic targets. The transcriptional profile in latent TB may give information regarding protective factors controlling the infection, important for vaccine development. Finally, definition of a transcriptional signature which responds to therapy could facilitate the development of surrogate biomarkers for drug or vaccine studies. Since any active TB signature may reflect common inflammatory responses evoked during many diseases, we also performed analysis of significance, comparing transcriptional profiles from patients with TB to those from patients with other bacterial and inflammatory diseases to identify a TB specific signature. The resulting signature was then tested against patients normalized to their own controls from 7 independent datasets: TB (Training and Validation Sets), Staphylococcus infection, Group A Streptococcus infection, Still's disease, and adult and pediatric SLE. This SuperSeries is composed of the following subset Series: GSE19435: Transcriptional profiles in Blood of patients with Tuberculosis - Longitudinal Study GSE19439: Blood Transcriptional Profiles in Active and Latent Tuberculosis UK (Training Set) GSE19442: Blood Transcriptional Profiles of TB in South Africa GSE19443: Blood Transcriptional Profiles of Active TB (UK Test Set Separated) GSE19444: Blood Transcriptional Profiles of Active and Latent TB (UK Test Set) GSE22098: Whole blood transcriptional profiles of patients with active tuberculosis (TB) and other inflammatory and infectious diseases Active Pulmonary TB: PTB - All patients were confirmed by isolation of Mycobacterium Tuberculosis on culture of sputum or bronchoalvelolar lavage fluid. Latent TB: LTB - All patients were screened at a tuberculosis clinic, being either new entrants to the UK from endemic countries or being household contacts of infectious cases, or in the case of the validation set recruited in South Africa, were residents of a high incidence country. All UK patients were positive by tuberculin skin test (>14mm if BCG vaccinated, >5mm if not vaccinated) and were also positive by Interferon-Gamma Release assay(IGRA); specifically Quantiferon Gold In-Tube Assay (Cellestis, Australia). The South African latent TB patients were all positive by Interferon-Gamma Release assay (IGRA); specifically Quantiferon Gold In-Tube Assay. Latent patients had no clinical, radiological or microbiological evidence of active infection and were asymptomatic. Healthy controls - these were volunteers without exposure to TB who were negative by both tuberculin skin test (<15mm if BCG vaccinated, <6mm if unvaccinated); and IGRA (as described above). Experimental variables : Patient group: Active PTB; Latent TB, Healthy controls (BCG vaccinated and unvaccinated). Ethnicity - a wide range of ethnic groups is represented. The active PTB group incorporates a range of smear positive and smear negative disease and a spectrum of disease extent/severity. Experimental methods: Whole blood was collected into Tempus tubes (Applied Biosystems, Foster City, CA, USA) and stored between -20degrees Celsius and -80 degrees Celsius before RNA extraction. For the training set cohort, and the active TB patients baseline samples in the longitudinal cohort, total RNA was isolated from whole blood using the PerfectPure RNA Blood kit (5 PRIME Inc, Gaithersburg, MD, USA). For the separated cell samples, total RNA was isolated using the Qiagen RNeasy Mini Kit. For all other cohorts Total RNA was isolated from whole blood using the MagMAX 96 well RNA isolation kit (Applied Biosystems, Foster City, CA, USA). Isolated total RNA was then globin reduced using the GLOBINclear 96-well format kit (Ambion, Austin, TX, USA) according to the manufacturer's instructions. Total and globin-reduced RNA integrity was assessed using an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). Biotinylated, amplified RNA targets (cRNA) were then prepared from the globin-reduced RNA using the Illumina CustomPrep RNA amplification kit (Ambion, Austin, TX, USA). Labeled cRNA was hybridized overnight to Sentrix HT12 V3 BeadChip arrays (>48,000 probes, Illumina Inc, San Diego, CA, USA), washed, blocked, stained and scanned on an Illumina BeadStation 500 following the manufacturer's protocols. Illumina's BeadStudio version 2 software was used to generate signal intensity values from the scans, substract background, and scale each microarray to the median average intensity for all samples (per-chip normalisation). This normalised data was used for all subsequent data analysis.