Unknown

Dataset Information

0

Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.


ABSTRACT:

Motivation

An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem.

Results

Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods.

Availability and implementation

Our program is freely available at https://github.com/ramzan1990/sequence2vec.

Contact

xin.gao@kaust.edu.sa or lsong@cc.gatech.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Dai H 

PROVIDER: S-EPMC5870668 | biostudies-literature | 2017 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.

Dai Hanjun H   Umarov Ramzan R   Kuwahara Hiroyuki H   Li Yu Y   Song Le L   Gao Xin X  

Bioinformatics (Oxford, England) 20171101 22


<h4>Motivation</h4>An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem.<h4>Results</h4>Here we propose a novel sequence embedding approach  ...[more]

Similar Datasets

| S-EPMC5024299 | biostudies-literature
| S-EPMC6717532 | biostudies-literature
| S-EPMC3854512 | biostudies-literature
| S-EPMC7999143 | biostudies-literature
| S-EPMC4643619 | biostudies-literature
| S-EPMC3102690 | biostudies-literature
| S-EPMC4838337 | biostudies-literature
| S-EPMC5694663 | biostudies-literature
| S-EPMC4509691 | biostudies-literature
| S-EPMC8115691 | biostudies-literature