Unknown

Dataset Information

0

On the problem of confounders in modeling gene expression.


ABSTRACT: MOTIVATION:Modeling of Transcription Factor (TF) binding from both ChIP-seq and chromatin accessibility data has become prevalent in computational biology. Several models have been proposed to generate new hypotheses on transcriptional regulation. However, there is no distinct approach to derive TF binding scores from ChIP-seq and open chromatin experiments. Here, we review biases of various scoring approaches and their effects on the interpretation and reliability of predictive gene expression models. RESULTS:We generated predictive models for gene expression using ChIP-seq and DNase1-seq data from DEEP and ENCODE. Via randomization experiments, we identified confounders in TF gene scores derived from both ChIP-seq and DNase1-seq data. We reviewed correction approaches for both data types, which reduced the influence of identified confounders without harm to model performance. Also, our analyses highlighted further quality control measures, in addition to model performance, that may help to assure model reliability and to avoid misinterpretation in future studies. AVAILABILITY AND IMPLEMENTATION:The software used in this study is available online at https://github.com/SchulzLab/TEPIC. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Schmidt F 

PROVIDER: S-EPMC6530814 | biostudies-literature | 2019 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

On the problem of confounders in modeling gene expression.

Schmidt Florian F   Schulz Marcel H MH  

Bioinformatics (Oxford, England) 20190201 4


<h4>Motivation</h4>Modeling of Transcription Factor (TF) binding from both ChIP-seq and chromatin accessibility data has become prevalent in computational biology. Several models have been proposed to generate new hypotheses on transcriptional regulation. However, there is no distinct approach to derive TF binding scores from ChIP-seq and open chromatin experiments. Here, we review biases of various scoring approaches and their effects on the interpretation and reliability of predictive gene exp  ...[more]

Similar Datasets

| S-EPMC2944732 | biostudies-literature
| S-EPMC5412732 | biostudies-literature
| S-EPMC4736698 | biostudies-literature
| S-EPMC8626902 | biostudies-literature
| S-EPMC2367561 | biostudies-literature
| S-EPMC5916289 | biostudies-literature
| S-EPMC8473983 | biostudies-literature
| S-EPMC3134274 | biostudies-literature
| S-EPMC3426564 | biostudies-literature
| S-EPMC3631628 | biostudies-literature