Unknown

Dataset Information

0

Prediction of enhancer-promoter interactions via natural language processing.


ABSTRACT: BACKGROUND:Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. RESULTS:We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~?0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~?0.940 can be achieved by combining sequence embedding features and experimental features. CONCLUSIONS:EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.

SUBMITTER: Zeng W 

PROVIDER: S-EPMC5954283 | biostudies-literature | 2018 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prediction of enhancer-promoter interactions via natural language processing.

Zeng Wanwen W   Wu Mengmeng M   Jiang Rui R  

BMC genomics 20180509 Suppl 2


<h4>Background</h4>Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput.<h4>Results</h4>We propose a novel computational framework EP2vec  ...[more]

Similar Datasets

| S-EPMC6961579 | biostudies-literature
| S-EPMC8472018 | biostudies-literature
| S-EPMC9252822 | biostudies-literature
| S-EPMC7856032 | biostudies-literature
| S-EPMC7486862 | biostudies-literature
| S-EPMC6735851 | biostudies-literature
| S-EPMC6722039 | biostudies-literature
| S-EPMC7592802 | biostudies-literature
| S-EPMC7938488 | biostudies-literature
| S-EPMC7965059 | biostudies-literature