Unknown

Dataset Information

0

Feature selection for gene prediction in metagenomic fragments.


ABSTRACT: Background:Computational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene prediction algorithms use different combinations of classification algorithms to distinguish between coding and non-coding sequences. Results:In this study, we apply a filter method to select relevant features from a large set of known features instead of combining them using linear classifiers or ignoring their individual coding potential. We use minimum redundancy maximum relevance (mRMR) to select the most relevant features. Support vector machines (SVM) are trained using these features, and the classification score is transformed into the posterior probability of the coding class. A greedy algorithm uses the probability of overlapped candidate genes to select the final genes. Instead of using one model for all sequences, we train an ensemble of SVM models on mutually exclusive datasets based on GC content and use the appropriated model to classify candidate genes based on their read's GC content. Conclusion:Our proposed algorithm achieves an improvement over some existing algorithms. mRMR produces promising results in gene prediction. It improves classification performance and feature interpretation. Our research serves as a basis for future studies on feature selection for gene prediction.

SUBMITTER: Al-Ajlan A 

PROVIDER: S-EPMC6047368 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

Feature selection for gene prediction in metagenomic fragments.

Al-Ajlan Amani A   El Allali Achraf A  

BioData mining 20180607


<h4>Background</h4>Computational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene prediction algorithms use different combinations of classification algorithms to distinguish between coding and non-coding sequences.<h4>Results</h4>In this study, we apply a filter method to select relevant features from a large set of known features instead of combinin  ...[more]

Similar Datasets

| S-EPMC5698827 | biostudies-literature
| S-EPMC3622649 | biostudies-literature
| S-EPMC2409338 | biostudies-literature
| S-EPMC6245785 | biostudies-other
| S-EPMC7287073 | biostudies-literature
| S-EPMC3376124 | biostudies-other
| S-EPMC4485860 | biostudies-literature
| S-EPMC9006223 | biostudies-literature
| S-EPMC11009020 | biostudies-literature
| S-EPMC4827277 | biostudies-literature