Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

Whole Blood Transcriptional Modules generated on Illumina Hu-6 V2 Beadchips

ABSTRACT: This dataset was used to establish whole blood transcriptional modules (n=260) that represent groups of coordinately expressed transcripts that exhibit altered abundance within individual datasets or across multiple datasets. This modular framework was generated to reduce the dimensionality of whole blood microarray data processed on the Illumina Beadchip platform yielding data-driven transcriptional modules with biologic meaning. This series combines nine independent datasets representing a spectrum of human pathologies expected to result in changes in gene abundance related to changes in expression or cellular composition of whole blood. These nine datasets are composed of 410 individual whole blood profiles generated from patients with HIV, tuberculosis, sepsis, systemic lupus erythematosus, systemic arthritis, B-cell deficiency and liver transplant. For each dataset healthy controls are also included. Each dataset’s expression data was preprocessed independently. First, probes were discarded if they were not present in at least ten percent of the dataset’s samples. Then, the sample data for each dataset was normalized using the BeadStudio average normalization algorithm. Once normalized, the signal was scaled such that all signals less than ten were set to ten. The signal median of all of the dataset’s samples was calculated for each probe. Probes were discarded if no sample had a difference in signal from the median that was greater than or equal to thirty, or if no sample had a fold change relative to the median that was either greater than or equal to 1.5, or less than or equal to 0.67. Finally, data was transformed to the log2 of the signal divided by the mean. Each of the preprocessed datasets was clustered in parallel using Euclidean distance and the Hartigan’s K-Means clustering algorithm, a hybrid of hierarchical and K-Means clustering algorithms. The number of clusters (k) was set to thirty, chosen to provide significant power during later module extraction steps. A higher value could have been chosen for k, but was not in order to minimize possibly arbitrary cluster splitting. Taking the nine sets of thirty clusters as input, we constructed a weighted co-cluster graph, a probe by probe matrix where the value of each cell (the weight) is set to the number of times probe_i and probe_j are found in the same cluster. In this instance, the values range from zero to nine, inclusive. At this point, the goal is to extract sets of probes that are most frequently clustered together, proceeding from the most stringent requirements to the least. To accomplish this, we employ the iterative algorithm. To begin, the maximum clique threshold is initialized to the number of input cluster sets, the paraclique threshold is calculated, and a minimum seed size is chosen (we used ten). The outer loop begins by creating an unweighted graph through application of the maximum clique threshold to the weighted co-cluster graph such that a probe pair, or edge, is represented in the unweighted graph if and only if the corresponding weight in the co-cluster graph equals or exceeds this threshold. We then begin the inner loop. The first step is to isolate the largest set of probes such that all pairs of probes in the set are completely connected in the unweighted graph - that is, there is no pair of probes in the set where the weight from the initial graph is smaller than the maximum clique threshold. In graph theoretic terms, the probes form a maximum clique. If the size of the probe set is smaller than the minimum seed size, we escape from the inner loop, reduce the threshold by one, and return to the beginning of the outer loop. Otherwise, the probe set is at least as large as the minimum seed size and it becomes the seed for a module. To allow for the inevitable clustering inaccuracies, we then employ the paraclique algorithm revisiting the co-cluster graph and adding to the seed any probe that is found to cluster with at least eighty-five percent of the seed’s members a number of times equal or exceeding the paraclique threshold. This final probe set is a module. It is removed from both graphs and named in accordance with the iterations in which it was found (i.e. a module extracted in the first iteration of the outer loop and the second iteration of the inner loop is designated M1.2). The inner loop then begins again with the reduced graphs. Those modules with conserved expression across diseases (formed by transcripts that cluster together for all nine datasets) were selected in early rounds whereas modules with greater disease specificity (formed by transcripts that cluster together only in a subset of the nine datasets) were selected in later rounds.

ORGANISM(S): Homo sapiens

SUBMITTER: Damien Chaussabel

PROVIDER: E-GEOD-29536 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Publications

Systems scale interactive exploration reveals quantitative and qualitative differences in response to influenza and pneumococcal vaccines.

Obermoser Gerlinde G Presnell Scott S Domico Kelly K Xu Hui H Wang Yuanyuan Y Anguiano Esperanza E Thompson-Snipes Luann L Ranganathan Rajaram R Zeitner Brad B Bjork Anna A Anderson David D Speake Cate C Ruchaud Emily E Skinner Jason J Alsina Laia L Sharma Mamta M Dutartre Helene H Cepika Alma A Israelsson Elisabeth E Nguyen Phuong P Nguyen Quynh-Anh QA Harrod A Carson AC Zurawski Sandra M SM Pascual Virginia V Ueno Hideki H Nepom Gerald T GT Quinn Charlie C Blankenship Derek D Palucka Karolina K Banchereau Jacques J Chaussabel Damien D

Immunity 20130401 4

Systems immunology approaches were employed to investigate innate and adaptive immune responses to influenza and pneumococcal vaccines. These two non-live vaccines show different magnitudes of transcriptional responses at different time points after vaccination. Software solutions were developed to explore correlates of vaccine efficacy measured as antibody titers at day 28. These enabled a further dissection of transcriptional responses. Thus, the innate response, measured within hours in the p ...[more]

PMID: 23601689

Publication: 1/3

Similar Datasets

Project description:****************************************************************** 450k ARRAY EXPANDED ANNOTATION PLATFORM: SEARCH GPL16304 ****************************************************************** Background: Measurement of genome-wide DNA methylation (DNAm) has become an important avenue for investigating potentially functional changes in various pathological conditions. Illumina Infinium is a relatively inexpensive and user-friendly DNAm microarray platform used by many researchers to measure DNAm on a large scale. However it has been suggested that a subset of probes may give rise to misleading results due to issues related to probe design. To facilitate biologically significant data interpretation, we set out to enhance probe annotation of the newest HumanMethylation450 BeadChip array (with >485,000 probes covering 99% of RefSeq genes). Results: Annotation that was added or expanded on includes 1) SNPs documented in the probe target, 2) probe binding specificity, 3) CpG classification of target sites and 4) gene feature classification of target sites. Probes with documented SNPs within 10bp of the target site and especially those with documented SNPs at the target CpG, were associated with increased within-tissue variation in DNAm. An example of a probe with a SNP at the target CpG was used to demonstrate how sample genotype can confound the measurement of DNAm. 8.6% of probes were identified as non-specific, in other words, these probes map to multiple locations in silico. DNAm measured from these non-specific probes likely represents a combination of DNAm from multiple genomic sites. The expanded biological annotation demonstrated that based on DNAm, grouping probes by alternative CpG classes rather than UCSC islands provides a more distinctive classification system of CpG sites. Finally variable enrichment for tissue-specific differentially methylated probes was noted across CpG classes and gene feature groups, depending on the tissues that were compared. Conclusion: Probes containing SNPs and non-specific probes may affect the assessment of DNAm using the 450k array. Additionally CpG enrichment classes and to a lesser extent gene feature groups were associated with distinct patterns of DNAm. Thus, we recommend that confounded probes be removed from analyses and that genomic trends be considered in analyses of the Illumina HumanMethylation450 BeadChip. DNAm arrays offer a powerful approach for which thoughtful use of probe content can be utilized to better understand the biological processes affected.

Dataset Information

Whole Blood Transcriptional Modules generated on Illumina Hu-6 V2 Beadchips

Publications

Systems scale interactive exploration reveals quantitative and qualitative differences in response to influenza and pneumococcal vaccines.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets