Unknown

Dataset Information

0

Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria.


ABSTRACT: This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip.

SUBMITTER: Coelho RV 

PROVIDER: S-EPMC5993011 | biostudies-literature | 2018 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

<i>Bacillus subtilis</i> promoter sequences data set for promoter prediction in Gram-positive bacteria.

Coelho Rafael Vieira RV   de Avila E Silva Scheila S   Echeverrigaray Sergio S   Delamare Ana Paula Longaray APL  

Data in brief 20180513


This paper presents a prediction of <i>Bacillus subtilis</i> promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the <i>B. subtilis</i> genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the  ...[more]

Similar Datasets

| S-EPMC151965 | biostudies-literature
| S-EPMC5743806 | biostudies-literature
| S-EPMC3165636 | biostudies-literature
| S-EPMC6881298 | biostudies-literature
| S-EPMC4079059 | biostudies-literature
| S-EPMC52658 | biostudies-other
| S-EPMC5116874 | biostudies-literature
| S-EPMC4880631 | biostudies-literature
2014-05-30 | GSE57245 | GEO
| S-EPMC205916 | biostudies-other