Project description:BACKGROUND: Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data. METHODS: We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average ? value (QN?); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated. RESULTS: Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QN?, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets. CONCLUSION: Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.

Project description:BackgroundWith the advent of array-based techniques to measure methylation levels in primary tumor samples, systematic investigations of methylomes have widely been performed on a large number of tumor entities. Most of these approaches are not based on measuring individual cell methylation but rather the bulk tumor sample DNA, which contains a mixture of tumor cells, infiltrating immune cells and other stromal components. This raises questions about the purity of a certain tumor sample, given the varying degrees of stromal infiltration in different entities. Previous methods to infer tumor purity require or are based on the use of matching control samples which are rarely available. Here we present a novel, reference free method to quantify tumor purity, based on two Random Forest classifiers, which were trained on ABSOLUTE as well as ESTIMATE purity values from TCGA tumor samples. We subsequently apply this method to a previously published, large dataset of brain tumors, proving that these models perform well in datasets that have not been characterized with respect to tumor purity .ResultsUsing two gold standard methods to infer purity - the ABSOLUTE score based on whole genome sequencing data and the ESTIMATE score based on gene expression data- we have optimized Random Forest classifiers to predict tumor purity in entities that were contained in the TCGA project. We validated these classifiers using an independent test data set and cross-compared it to other methods which have been applied to the TCGA datasets (such as ESTIMATE and LUMP). Using Illumina methylation array data of brain tumor entities (as published in Capper et al. (Nature 555:469-474,2018)) we applied this model to estimate tumor purity and find that subgroups of brain tumors display substantial differences in tumor purity.ConclusionsRandom forest- based tumor purity prediction is a well suited tool to extrapolate gold standard measures of purity to novel methylation array datasets. In contrast to other available methylation based tumor purity estimation methods, our classifiers do not need a priori knowledge about the tumor entity or matching control tissue to predict tumor purity.

Project description:Formalin-fixed, paraffin-embedded (FFPE) samples are a highly desirable resource for epigenetic studies, but there is no suitable platform to assay genome-wide methylation in these widely available resources. Recently, Thirlwell et al. (2010) have reported a modified ligation-based DNA repair protocol to prepare FFPE DNA for the Infinium methylation assay. In this study, we have tested the accuracy of methylation data obtained with this modification by comparing paired fresh-frozen (FF) and FFPE colon tissue (normal and tumor) from colorectal cancer patients. We report locus-specific correlation and concordance of tumor-specific differentially methylated loci (DML), both of which were not previously assessed.We used Illumina's Infinium Methylation 27K chip for 12 pairs of FF and 12 pairs of FFPE tissue from tumor and surrounding healthy tissue from the resected colon of the same individual, after repairing the FFPE DNA using Thirlwell's modified protocol.For both tumor and normal tissue, overall correlation of ? values between all loci in paired FF and FFPE was comparable to previous studies. Tissue storage type (FF or FFPE) was found to be the most significant source of variation rather than tissue type (normal or tumor). We found a large number of DML between FF and FFPE DNA. Using ANOVA, we also identified DML in tumor compared to normal tissue in both FF and FFPE samples, and out of the top 50 loci in both groups only 7 were common, indicating poor concordance. Likewise, while looking at the correlation of individual loci between FFPE and FF across the patients, less than 10% of loci showed strong correlation (r ? 0.6). Finally, we checked the effect of the ligation-based modification on the Infinium chemistry for SNP genotyping on an independent set of samples, which also showed poor performance.Ligation of FFPE DNA prior to the Infinium genome-wide methylation assay may detect a reasonable number of loci, but the numbers of detected loci are much fewer than in FF samples. More importantly, the concordance of DML detected between FF and FFPE DNA is suboptimal, and DML from FFPE tissues should be interpreted with great caution.

Dataset Information

A revision of the InfiniumPurify R package for genome-wide correction of tumor purity in Infinium DNA methylation array data

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets