Project description:Although principal component analysis is frequently used in multivariate/ analysis, it has disadvantages when applied to experimental or diagnostic data. First, the identified principal components have poor generality; since the size and directions of the components are dependent on the particular data set, the components are valid only within the set. Second, the method is sensitive to experimental noise and bias between sample groups, since it cannot reflect the design of experiments; rather, it estimates the same weight and independence of all the samples in the matrix. Third, the resulting components are often difficult to interpret. To address these issues, several options were introduced to the methodology. The resulting components were scaled to unify their size unit. Also, the principal axes were identified using training data sets and shared among experiments. This training data reflects the design of experiments, and its preparation allows noise to be reduced and group bias to be removed. The effects of these options were observed in microarray experiments, and showed an improvement in the separation of groups and robustness to noise. Additionally, unknown samples were appropriately classified using pre-arranged axes, and principal axes well reflected the characteristics of groups in the experiments. The summarized levels the genes are presented in the Matrix form.
Project description:Although principal component analysis is frequently used in multivariate/ analysis, it has disadvantages when applied to experimental or diagnostic data. First, the identified principal components have poor generality; since the size and directions of the components are dependent on the particular data set, the components are valid only within the set. Second, the method is sensitive to experimental noise and bias between sample groups, since it cannot reflect the design of experiments; rather, it estimates the same weight and independence of all the samples in the matrix. Third, the resulting components are often difficult to interpret. To address these issues, several options were introduced to the methodology. The resulting components were scaled to unify their size unit. Also, the principal axes were identified using training data sets and shared among experiments. This training data reflects the design of experiments, and its preparation allows noise to be reduced and group bias to be removed. The effects of these options were observed in microarray experiments, and showed an improvement in the separation of groups and robustness to noise. Additionally, unknown samples were appropriately classified using pre-arranged axes, and principal axes well reflected the characteristics of groups in the experiments. The summarized levels the genes are presented in the Matrix form.
Project description:Although principal component analysis is frequently used in multivariate/ analysis, it has disadvantages when applied to experimental or diagnostic data. First, the identified principal components have poor generality; since the size and directions of the components are dependent on the particular data set, the components are valid only within the set. Second, the method is sensitive to experimental noise and bias between sample groups, since it cannot reflect the design of experiments; rather, it estimates the same weight and independence of all the samples in the matrix. Third, the resulting components are often difficult to interpret. To address these issues, several options were introduced to the methodology. The resulting components were scaled to unify their size unit. Also, the principal axes were identified using training data sets and shared among experiments. This training data reflects the design of experiments, and its preparation allows noise to be reduced and group bias to be removed. The effects of these options were observed in microarray experiments, and showed an improvement in the separation of groups and robustness to noise. Additionally, unknown samples were appropriately classified using pre-arranged axes, and principal axes well reflected the characteristics of groups in the experiments. This SuperSeries is composed of the SubSeries listed below.
Project description:Transcriptomic data were generated for mouse tumors of each combination of genotypes that included either presence or absence of heterozygosity for hSS2 in Rosa26 and homozygosity for wildtype or floxed alleles of Smarcb1. Each genotype clustered most closely with itself in pairwise comparisons of whole transcriptomes and principal component analysis. Overall Principal component analysis, K-mean cluster and KEGG Pathway analysis all demonstrated that tumors generated by genetically disrupted Smarcb1 are different from tumors with SS18-SSX expression only.
Project description:In a supervised principal component analysis, histiocytes from TCHRBCL were most closely related to epithelioid cells from NLPHL, with both types of cells expressing genes related to proinflammatory and regulatory macrophage activity.
Project description:Background and aim: Analysis of data obtained from genome wide gene expression experiments is challenging, due to the huge amount of variables, management of the data and the need for multivariate analysis. We here present the R package: pcaGoPromoter that facilitates the interpretation of genome wide expression data to overcome these problems. In a first step principal component analysis is applied to overview any differences between the observations and possible groupings. The next step is interpretation of the principal components with respect to both biological function and involvement of predicted transcription factor binding sites. The robustness of the results is evaluated using cross validation. Illustrative plots of PCA score plots and Gene Ontology terms are available. To illustrate the functionality of the R package, we designed a serum stimulation experiment, where the main biological outcome is well documented. Results: Samples from the serum stimulation experiment were analyzed using the Affymetrix Human Genome U133 Plus 2.0 chip. The array data were analyzed by the tools of the pcaGoPromoter package, which resulted in a clear separation of the observations into the three experimental groups - controls, serum only and serum with inhibitor. The functional annotation of the axes in the PCA score plot showed the expected serum promoted biological processes such as cell cycle progression and the predicted involvement of the expected transcription factors including E2F. In addition unexpected results, e.g. the cholesterol synthesis in serum depleted cells and NF-κB activation in inhibitor treated cells were uncovered. Conclusion: The pcaGoPromoter R package provides a collection of tools for analyzing gene expression data. It works with any platform using gene symbols or Entrez Ids as probe identifiers. In addition support for several popular Affymetrix GeneChip platforms is provided. The tools give an overview of the data via principal component analysis, functional interpretation by Gene Ontology terms (biological processes), and indication of involvement of possible transcription factors. Thus, pcaGoPromoter structures the high-dimensional data of gene expression experiments and can be applied to generate hypotheses for further exploration.
Project description:We report the chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) on H3, H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3, and H3K36me3. ChIP-seq were sequenced to a depth of at least 8 million total reads/sample. We found that H3K4me3, H3K27ac and H3K4me1 show separation between unicellular and multicellular stages in principal component analysis (PCA); other marks were unable to show a clear separation in PCA.
Project description:We classified samples and deciphered a key genes signature of intratumor heterogeneity by Principal Component Analysis and Weighted Gene Co-expression Network Analysis. At the genome level, we identified common GB copy number alterations and but a strong inter-individual molecular heterogeneity.
Project description:A cDNA-microarray was designed and used to monitor the transcriptomic profile of Dehalococcoides mccartyi strain 195 (in a mixed community) respiring various chlorinated organics, including chloroethenes and 2,3-dichlorophenol. The cultures were continuously fed in order to establish steady-state respiration rates and substrate levels. The organization of array data into a clustered heat map revealed two major experimental partitions. This partitioning in the data set was further explored through principal component analysis. The first two principal components separated the experiments into those with slow (1.6 plus or minus 0.6 M Cl- per h) and fast (22.9 plus or minus 9.6 M Cl- per h) respiring cultures. Additionally, the transcripts with the highest loadings in these principal components were identified, suggesting that those transcripts were responsible for the partitioning of the experiments. By analyzing the transcriptomes (n = 53) across experiments, relationships among transcripts were identified, and hypotheses about the relationships between electron transport chain members were proposed. One hypothesis, that the hydrogenases Hup and Hym and the formate dehydrogenase-like oxidoreductase (DET0186–DET0187) form a complex (as displayed by their tight clustering in the heat map analysis), was explored using a nondenaturing protein separation technique combined with proteomic sequencing. Although these proteins did not migrate as a single complex, DET0112 (an FdhB-like protein encoded in the Hup operon) was found to comigrate with DET0187 rather than with the catalytic Hup subunit DET0110. On closer inspection of the genome annotations of all Dehalococcoides strains, the DET0185-to-DET0187 operon was found to lack a key subunit, an FdhB-like protein. Therefore, on the basis of the transcriptomic, genomic, and proteomic evidence, the place of the missing subunit in the DET0185-to-DET0187 operon is likely filled by recruiting a subunit expressed from the Hup operon (DET0112).
Project description:We classified samples and deciphered a key genes signature of intratumor heterogeneity by Principal Component Analysis and Weighted Gene Co-expression Network Analysis. Transcriptome analysis highlighted a pronounced intratumor architecture reflecting the surgical sampling plan of the study and identified gene modules associated with hallmarks of cancer.