Unknown

Dataset Information

0

CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction.


ABSTRACT: Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs are based on extracting a large number of features and then applying statistical approaches or supervised classification approaches to predict genes. In our study, we introduce a convolutional neural network for metagenomics gene prediction (CNN-MGP) program that predicts genes in metagenomics fragments directly from raw DNA sequences, without the need for manual feature extraction and feature selection stages. CNN-MGP is able to learn the characteristics of coding and non-coding regions and distinguish coding and non-coding open reading frames (ORFs). We train 10 CNN models on 10 mutually exclusive datasets based on pre-defined GC content ranges. We extract ORFs from each fragment; then, the ORFs are encoded numerically and inputted into an appropriate CNN model based on the fragment-GC content. The output from the CNN is the probability that an ORF will encode a gene. Finally, a greedy algorithm is used to select the final gene list. Overall, CNN-MGP is effective and achieves a 91% accuracy on testing dataset. CNN-MGP shows the ability of deep learning to predict genes in metagenomics fragments, and it achieves an accuracy higher than or comparable to state-of-the-art gene-prediction programs that use pre-defined features.

SUBMITTER: Al-Ajlan A 

PROVIDER: S-EPMC6841655 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction.

Al-Ajlan Amani A   El Allali Achraf A  

Interdisciplinary sciences, computational life sciences 20181227 4


Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs are based on extracting a large number of features and then applying statistical approaches or supervised classification approaches to predict genes. In our study, we introduce a convolutional neural network for metagenomics gene prediction (CNN-MGP) program that predicts genes in metagenomics fragments  ...[more]

Similar Datasets

| S-EPMC6488737 | biostudies-literature
| S-EPMC7689358 | biostudies-literature
| S-EPMC7006502 | biostudies-literature
| S-EPMC8682773 | biostudies-literature
| S-EPMC7098575 | biostudies-literature
| S-EPMC5986618 | biostudies-literature
| S-EPMC7279608 | biostudies-literature
| S-EPMC7220942 | biostudies-literature
| S-EPMC8609763 | biostudies-literature
| S-EPMC5932613 | biostudies-literature