Unknown

Dataset Information

0

Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism.


ABSTRACT: DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can accurately predict DNA-binding sites in known DBPs can also identify novel DBPs. Moreover, sequence information is blind to the cellular- and disease-specific contexts of DBP activities, whereas the under-utilized knowledge from public gene expression data offers great promise. To address these issues, we have developed novel methods for predicting DBPs by integrating sequence and gene expression-derived features and applied them to explore human, mouse and Arabidopsis proteomes. While our sequence-based models outperformed the gene expression-based ones, some proteins with weaker DBP-like sequence features were correctly predicted by gene expression-based features, suggesting that these proteins acquire a tangible DBP functionality in a conducive gene expression environment. Analysis of motif enrichment among the co-expressed genes of top 100 candidates DBPs from hitherto unannotated genes provides further avenues to explore their functional associations.

SUBMITTER: Ahmad S 

PROVIDER: S-EPMC5758906 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism.

Ahmad Shandar S   Prathipati Philip P   Tripathi Lokesh P LP   Chen Yi-An YA   Arya Ajay A   Murakami Yoichi Y   Mizuguchi Kenji K  

Nucleic acids research 20180101 1


DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can accurately predict DNA-binding sites in known DBPs can also identify novel DBPs. Moreover, sequence information is blind to the cellular- and disease-specific contexts of DBP activities, whereas the un  ...[more]

Similar Datasets

2013-05-25 | E-GEOD-46611 | biostudies-arrayexpress
2013-05-25 | GSE46611 | GEO
| S-EPMC3818907 | biostudies-literature
| S-EPMC2913925 | biostudies-literature
| S-EPMC2876955 | biostudies-literature
| S-EPMC8142496 | biostudies-literature
| S-EPMC3787635 | biostudies-literature
| S-EPMC5964011 | biostudies-literature
| S-EPMC10774124 | biostudies-literature
| S-EPMC7145599 | biostudies-literature