Unknown

Dataset Information

0

DIRECTION: a machine learning framework for predicting and characterizing DNA methylation and hydroxymethylation in mammalian genomes.


ABSTRACT: Motivation:5-Methylcytosine and 5-Hydroxymethylcytosine in DNA are major epigenetic modifications known to significantly alter mammalian gene expression. High-throughput assays to detect these modifications are expensive, labor-intensive, unfeasible in some contexts and leave a portion of the genome unqueried. Hence, we devised a novel, supervised, integrative learning framework to perform whole-genome methylation and hydroxymethylation predictions in CpG dinucleotides. Our framework can also perform imputation of missing or low quality data in existing sequencing datasets. Additionally, we developed infrastructure to perform in silico, high-throughput hypotheses testing on such predicted methylation or hydroxymethylation maps. Results:We test our approach on H1 human embryonic stem cells and H1-derived neural progenitor cells. Our predictive model is comparable in accuracy to other state-of-the-art DNA methylation prediction algorithms. We are the first to predict hydroxymethylation in silico with high whole-genome accuracy, paving the way for large-scale reconstruction of hydroxymethylation maps in mammalian model systems. We designed a novel, beam-search driven feature selection algorithm to identify the most discriminative predictor variables, and developed a platform for performing integrative analysis and reconstruction of the epigenome. Our toolkit DIRECTION provides predictions at single nucleotide resolution and identifies relevant features based on resource availability. This offers enhanced biological interpretability of results potentially leading to a better understanding of epigenetic gene regulation. Availability and implementation:http://www.pradiptaray.com/direction, under CC-by-SA license. Contacts:pradiptaray@gmail.com or mchen@utdallas.edu or michael.zhang@utdallas.edu. Supplementary information:Supplementary data are available at Bioinformatics online.

SUBMITTER: Pavlovic M 

PROVIDER: S-EPMC5870843 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

DIRECTION: a machine learning framework for predicting and characterizing DNA methylation and hydroxymethylation in mammalian genomes.

Pavlovic Milos M   Ray Pradipta P   Pavlovic Kristina K   Kotamarti Aaron A   Chen Min M   Zhang Michael Q MQ  

Bioinformatics (Oxford, England) 20171001 19


<h4>Motivation</h4>5-Methylcytosine and 5-Hydroxymethylcytosine in DNA are major epigenetic modifications known to significantly alter mammalian gene expression. High-throughput assays to detect these modifications are expensive, labor-intensive, unfeasible in some contexts and leave a portion of the genome unqueried. Hence, we devised a novel, supervised, integrative learning framework to perform whole-genome methylation and hydroxymethylation predictions in CpG dinucleotides. Our framework can  ...[more]

Similar Datasets

2023-08-07 | GSE231345 | GEO
2023-08-07 | GSE231344 | GEO
2023-08-07 | GSE231343 | GEO
| S-EPMC8413337 | biostudies-literature
| S-EPMC9411552 | biostudies-literature
| S-EPMC3048134 | biostudies-literature
| S-EPMC8268592 | biostudies-literature
| S-EPMC8645205 | biostudies-literature
| S-EPMC10770919 | biostudies-literature
| S-EPMC10589844 | biostudies-literature