Unknown

Dataset Information

0

Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique.


ABSTRACT: Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in Agrobacterium tumefaciens (A. tumefaciens) strain C58. In the model, promotor sequences were encoded by three different kinds of feature descriptors, namely, accumulated nucleotide frequency, k-mer nucleotide composition, and binary encodings. The obtained features were optimized by using correlation and the mRMR-based algorithm. These optimized features were inputted into a random forest (RF) classifier to discriminate promotor sequences from non-promotor sequences in A. tumefaciens strain C58. The examination of 10-fold cross-validation showed that the proposed model could yield an overall accuracy of 0.837. This model will provide help for the study of promoters in A. tumefaciens C58 strain.

SUBMITTER: Zulfiqar H 

PROVIDER: S-EPMC10133480 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

altmetric image

Publications

Computational prediction of promotors in <i>Agrobacterium tumefaciens</i> strain C58 by using the machine learning technique.

Zulfiqar Hasan H   Ahmed Zahoor Z   Kissanga Grace-Mercure Bakanina B   Hassan Farwa F   Zhang Zhao-Yue ZY   Liu Fen F  

Frontiers in microbiology 20230413


Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in <i>Agrobacterium tumefaciens</i> (<i>A. tumefaciens</i>) strain C58. In the model, promotor sequences were encoded by three different kinds  ...[more]

Similar Datasets

| S-EPMC3439454 | biostudies-literature
| S-EPMC2648182 | biostudies-literature
| PRJNA869899 | ENA
| PRJNA270111 | ENA
| PRJNA869898 | ENA
| S-EPMC93418 | biostudies-literature
2021-12-29 | GSE174467 | GEO
2015-07-24 | GSE71267 | GEO
| S-EPMC3408200 | biostudies-literature
2014-09-25 | GSE61737 | GEO