Unknown

Dataset Information

0

Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences.


ABSTRACT: We introduce and study a set of training-free methods of an information-theoretic and algorithmic complexity nature that we apply to DNA sequences to identify their potential to identify nucleosomal binding sites. We test the measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint high and low nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that the complexity indices are informative of nucleosome occupancy. We found that, while it is clear that the gold standard Kaplan model is driven by GC content (by design) and by k-mer training; for high occupancy, entropy and complexity-based scores are also informative and can complement the Kaplan model.

SUBMITTER: Zenil H 

PROVIDER: S-EPMC6846163 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences.

Zenil Hector H   Minary Peter P  

Nucleic acids research 20191101 20


We introduce and study a set of training-free methods of an information-theoretic and algorithmic complexity nature that we apply to DNA sequences to identify their potential to identify nucleosomal binding sites. We test the measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint high and low nucleosome occupancy. We explore different possibl  ...[more]

Similar Datasets

| S-EPMC4035975 | biostudies-literature
| S-EPMC5771128 | biostudies-literature
| S-EPMC3218343 | biostudies-other
| S-EPMC3071895 | biostudies-literature
| S-EPMC2441789 | biostudies-literature
| S-EPMC5387343 | biostudies-literature
| S-EPMC2945185 | biostudies-literature
| S-EPMC3883788 | biostudies-literature
| S-EPMC6544651 | biostudies-literature
| EGAS00001004370 | EGA