Unknown

Dataset Information

0

SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps.


ABSTRACT: Genome-wide maps of transcription factor (TF) occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL trains a discriminative model using a k-mer feature representation together with group lasso regularization to extract a collection of sequence signals that distinguish peak sequences from flanking regions. Benchmarked on over 100 ChIP-seq experiments, SeqGL outperformed traditional motif discovery tools in discriminative accuracy. Furthermore, SeqGL can be naturally used with multitask learning to identify genomic and cell-type context determinants of TF binding. SeqGL successfully scales to the large multiplicity of sequence signals in DNase- or ATAC-seq maps. In particular, SeqGL was able to identify a number of ChIP-seq validated sequence signals that were not found by traditional motif discovery algorithms. Thus compared to widely used motif discovery algorithms, SeqGL demonstrates both greater discriminative accuracy and higher sensitivity for detecting the DNA sequence signals underlying regulatory element maps. SeqGL is available at http://cbio.mskcc.org/public/Leslie/SeqGL/.

SUBMITTER: Setty M 

PROVIDER: S-EPMC4446265 | biostudies-literature | 2015 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps.

Setty Manu M   Leslie Christina S CS  

PLoS computational biology 20150527 5


Genome-wide maps of transcription factor (TF) occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL trains a discriminative model using a k-mer feature representation together with group lasso regularization to extract a collection of sequence signals that distinguish peak sequences from flanking regi  ...[more]

Similar Datasets

| S-EPMC5100580 | biostudies-literature
| S-EPMC4240582 | biostudies-literature
| S-EPMC2953742 | biostudies-literature
| S-EPMC3293935 | biostudies-literature
| S-ECPF-GEOD-49955 | biostudies-other
2014-11-21 | E-GEOD-58714 | biostudies-arrayexpress
| S-EPMC6320013 | biostudies-literature
| S-EPMC548334 | biostudies-literature
2014-11-21 | GSE58714 | GEO
| S-EPMC4480131 | biostudies-literature