Dataset Information

A data-driven approach to preprocessing Illumina 450K methylation array data

ABSTRACT: Background: As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterised. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. Results: The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities. Betas calculated from raw signal intensities (the default GenomeStudio behaviour) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. Conclusions: Careful selection of preprocessing steps can minimise variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created an R software package called wateRmelon, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilise the same normalization methods and data quality tests on 450K data.

ORGANISM(S): Homo sapiens

PROVIDER: GSE43414 | GEO | 2013/01/24

SECONDARY ACCESSION(S): PRJNA187197

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Background: As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterised. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. Results: The standard index of DNA methylation at any specific CpG site is ? = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities. Betas calculated from raw signal intensities (the default GenomeStudio behaviour) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. Conclusions: Careful selection of preprocessing steps can minimise variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created an R software package called wateRmelon, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilise the same normalization methods and data quality tests on 450K data. Bisulfite converted DNA from the 11 cohorts (N=695, including 36 technical replicates) were hybridised to the Illumina Infinium 450k Human Methylation Beadchip v1.2

Project description:Abstract The proper identification of differentially methylated CpGs is central in most epigenetic studies. The Illumina Human Methylation 450k BeadChip is widely used to quantify DNA methylation, nevertheless the design of an appropriate analysis pipeline faces severe challenges due to the convolution of biological and technical variability and the presence of a signal bias between Infinium I and II probe design types. Despite recent attempts to investigate how to analyze DNA methylation data with such an array design, it has not been possible to perform a comprehensive comparison between different bioinformatics pipelines due to the lack of appropriate datasets having both large sample size and sufficient number of technical replicates. Here we perform such a comparative analysis, targeting the problems of reducing the technical variability, eliminating the probe design bias and reducing the batch effect by exploiting two unpublished datasets, which included technical replicates and were profiled for DNA methylation either on peripheral blood, monocytes or muscle biopsies. We evaluated the performance of different analysis pipelines and demonstrated that a) it is critical to correct for the probe design type, since the amplitude of the measured methylation change depends on the underlying chemistry; b) the effect of different normalization schemes is mixed, and the most effective method in our hands were quantile normalization and Beta Mixture Quantile dilation (BMIQ); c) it is beneficial to correct for batch effects. In conclusion, our comparative analysis using a comprehensive dataset suggests an efficient pipeline for proper identification of differentially methylated CpGs using the Illumina 450k arrays. DNA samples from peripheral blood or CD14+ monocytes were included in the study. DNA methylation levels were profiled using Illumina 450K arrays. Specifically, 50 biological sample replicates from PB and 36 biological sample replicates from monocytes were randomly assigned to 8 BeadChips with technical replicates and processed in one run (a total of 96 DNA samples). Eight samples were technically replicated in pairs, while one sample was represented in a trio of replicates. Different analysis pipelines were compared, however, the file uploaded refers to the best scored. In our publication we used this one to make all analyses and conclusions.

Dataset Information

A data-driven approach to preprocessing Illumina 450K methylation array data

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets