Unknown

Dataset Information

0

Penalized logistic regression for high-dimensional DNA methylation data with case-control studies.


ABSTRACT:

Motivation

DNA methylation is a molecular modification of DNA that plays crucial roles in regulation of gene expression. Particularly, CpG rich regions are frequently hypermethylated in cancer tissues, but not methylated in normal tissues. However, there are not many methodological literatures of case-control association studies for high-dimensional DNA methylation data, compared with those of microarray gene expression. One key feature of DNA methylation data is a grouped structure among CpG sites from a gene that are possibly highly correlated. In this article, we proposed a penalized logistic regression model for correlated DNA methylation CpG sites within genes from high-dimensional array data. Our regularization procedure is based on a combination of the l(1) penalty and squared l(2) penalty on degree-scaled differences of coefficients of CpG sites within one gene, so it induces both sparsity and smoothness with respect to the correlated regression coefficients. We combined the penalized procedure with a stability selection procedure such that a selection probability of each regression coefficient was provided which helps us make a stable and confident selection of methylation CpG sites that are possibly truly associated with the outcome.

Results

Using simulation studies we demonstrated that the proposed procedure outperforms existing main-stream regularization methods such as lasso and elastic-net when data is correlated within a group. We also applied our method to identify important CpG sites and corresponding genes for ovarian cancer from over 20 000 CpGs generated from Illumina Infinium HumanMethylation27K Beadchip. Some genes identified are potentially associated with cancers.

SUBMITTER: Sun H 

PROVIDER: S-EPMC3348559 | biostudies-literature | 2012 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Penalized logistic regression for high-dimensional DNA methylation data with case-control studies.

Sun Hokeun H   Wang Shuang S  

Bioinformatics (Oxford, England) 20120330 10


<h4>Motivation</h4>DNA methylation is a molecular modification of DNA that plays crucial roles in regulation of gene expression. Particularly, CpG rich regions are frequently hypermethylated in cancer tissues, but not methylated in normal tissues. However, there are not many methodological literatures of case-control association studies for high-dimensional DNA methylation data, compared with those of microarray gene expression. One key feature of DNA methylation data is a grouped structure amon  ...[more]

Similar Datasets

| S-EPMC6527211 | biostudies-literature
| S-EPMC6805595 | biostudies-literature
| S-EPMC2800840 | biostudies-literature
| S-EPMC2732298 | biostudies-literature
| S-EPMC6329526 | biostudies-literature
| S-EPMC2567351 | biostudies-literature
| S-EPMC5860278 | biostudies-literature
| S-EPMC3842118 | biostudies-other
| S-EPMC4862742 | biostudies-literature
| S-EPMC5842543 | biostudies-literature