Unknown

Dataset Information

0

Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network.


ABSTRACT: Modeling in-vivo protein-DNA binding is not only fundamental for further understanding of the regulatory mechanisms, but also a challenging task in computational biology. Deep-learning based methods have succeed in modeling in-vivo protein-DNA binding, but they often (1) follow the fully supervised learning framework and overlook the weakly supervised information of genomic sequences that a bound DNA sequence may has multiple TFBS(s), and, (2) use one-hot encoding to encode DNA sequences and ignore the dependencies among nucleotides. In this paper, we propose a weakly supervised framework, which combines multiple-instance learning with a hybrid deep neural network and uses k-mer encoding to transform DNA sequences, for modeling in-vivo protein-DNA binding. Firstly, this framework segments sequences into multiple overlapping instances using a sliding window, and then encodes all instances into image-like inputs of high-order dependencies using k-mer encoding. Secondly, it separately computes a score for all instances in the same bag using a hybrid deep neural network that integrates convolutional and recurrent neural networks. Finally, it integrates the predicted values of all instances as the final prediction of this bag using the Noisy-and method. The experimental results on in-vivo datasets demonstrate the superior performance of the proposed framework. In addition, we also explore the performance of the proposed framework when using k-mer encoding, and demonstrate the performance of the Noisy-and method by comparing it with other fusion methods, and find that adding recurrent layers can improve the performance of the proposed framework.

SUBMITTER: Zhang Q 

PROVIDER: S-EPMC6559991 | biostudies-literature | 2019 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network.

Zhang Qinhu Q   Shen Zhen Z   Huang De-Shuang DS  

Scientific reports 20190611 1


Modeling in-vivo protein-DNA binding is not only fundamental for further understanding of the regulatory mechanisms, but also a challenging task in computational biology. Deep-learning based methods have succeed in modeling in-vivo protein-DNA binding, but they often (1) follow the fully supervised learning framework and overlook the weakly supervised information of genomic sequences that a bound DNA sequence may has multiple TFBS(s), and, (2) use one-hot encoding to encode DNA sequences and ign  ...[more]

Similar Datasets

| S-EPMC5870851 | biostudies-other
| S-EPMC9310706 | biostudies-literature
| S-EPMC8009201 | biostudies-literature
| S-EPMC3850986 | biostudies-literature
| S-EPMC6192215 | biostudies-literature
| S-EPMC6203325 | biostudies-literature
| S-EPMC8426140 | biostudies-literature
| S-EPMC5543478 | biostudies-other
| S-EPMC3439725 | biostudies-literature
| S-EPMC7332070 | biostudies-literature