Dataset Information

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

ABSTRACT: DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.

SUBMITTER: Manavalan B

PROVIDER: S-EPMC5788611 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

Manavalan Balachandran B Shin Tae Hwan TH Lee Gwang G

Oncotarget 20171208 2

DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational ...[more]

PMID: 29416743

Dataset Information

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

Publications

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A support vector machine and a random forest classifier indicates a 15-miRNA set related to osteosarcoma recurrence.
| S-EPMC5759858 | biostudies-literature

Index and biological spectrum of human DNase I hypersensitive sites.
| S-EPMC7422677 | biostudies-literature

DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays.
| S-EPMC2698431 | biostudies-literature

Analysis of Asperger Syndrome Using Genetic-Evolutionary Random Support Vector Machine Cluster.
| S-EPMC6262410 | biostudies-other

Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine.
| S-EPMC4128822 | biostudies-literature

Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers.
| S-EPMC5600879 | biostudies-literature

Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest.
| S-EPMC3089490 | biostudies-literature

Identification of Penicillin-binding proteins employing support vector machines and random forest.
| S-EPMC3705620 | biostudies-literature

Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions.
| S-EPMC4152949 | biostudies-literature

Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis.
| S-EPMC10097675 | biostudies-literature