Unknown

Dataset Information

0

MBD-seq as a cost-effective approach for methylome-wide association studies: demonstration in 1500 case--control samples.


ABSTRACT: We studied the use of methyl-CpG binding domain (MBD) protein-enriched genome sequencing (MBD-seq) as a cost-effective screening tool for methylome-wide association studies (MWAS).Because MBD-seq has not yet been applied on a large scale, we first developed and tested a pipeline for data processing using 1500 schizophrenia cases and controls plus 75 technical replicates with an average of 68 million reads per sample. This involved the use of technical replicates to optimize quality control for multi- and duplicate-reads, an in silico experiment to identify CpGs in loci with alignment problems, CpG coverage calculations based on multiparametric estimates of the fragment size distribution, a two-stage adaptive algorithm to combine data from correlated adjacent CpG sites, principal component analyses to control for confounders and new software tailored to handle the large data set.We replicated MWAS findings in independent samples using a different technology that provided single base resolution. In an MWAS of age-related methylation changes, one of our top findings was a previously reported robust association involving GRIA2. Our results also suggested that owing to the many confounding effects, a considerable challenge in MWAS is to identify those effects that are informative about disease processes.This study showed the potential of MBD-seq as a cost-effective tool in large-scale disease studies.

SUBMITTER: Aberg KA 

PROVIDER: S-EPMC3923085 | biostudies-literature | 2012 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications


<h4>Aim</h4>We studied the use of methyl-CpG binding domain (MBD) protein-enriched genome sequencing (MBD-seq) as a cost-effective screening tool for methylome-wide association studies (MWAS).<h4>Materials & methods</h4>Because MBD-seq has not yet been applied on a large scale, we first developed and tested a pipeline for data processing using 1500 schizophrenia cases and controls plus 75 technical replicates with an average of 68 million reads per sample. This involved the use of technical repl  ...[more]

Similar Datasets

| S-EPMC5739096 | biostudies-literature
| S-EPMC8920199 | biostudies-literature
2016-11-15 | GSE89872 | GEO
| S-EPMC7595582 | biostudies-literature
2019-12-28 | GSE142656 | GEO
| S-EPMC5066141 | biostudies-literature
| S-EPMC4619007 | biostudies-literature
| S-EPMC4542811 | biostudies-other
2021-02-23 | GSE167300 | GEO
| S-EPMC3599654 | biostudies-literature