Unknown

Dataset Information

0

ODNA: identification of organellar DNA by machine learning.


ABSTRACT:

Motivation

Identifying organellar DNA, such as mitochondrial or plastid sequences, inside a whole genome assembly, remains challenging and requires biological background knowledge. To address this, we developed ODNA based on genome annotation and machine learning to fulfill.

Results

ODNA is a software that classifies organellar DNA sequences within a genome assembly by machine learning based on a predefined genome annotation workflow. We trained our model with 829 769 DNA sequences from 405 genome assemblies and achieved high predictive performance (e.g. matthew's correlation coefficient of 0.61 for mitochondria and 0.73 for chloroplasts) on independent validation data, thus outperforming existing approaches significantly.

Availability and implementation

Our software ODNA is freely accessible as a web service at https://odna.mathematik.uni-marburg.de and can also be run in a docker container. The source code can be found at https://gitlab.com/mosga/odna and the processed data at Zenodo (DOI: 10.5281/zenodo.7506483).

SUBMITTER: Martin R 

PROVIDER: S-EPMC10229373 | biostudies-literature | 2023 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

ODNA: identification of organellar DNA by machine learning.

Martin Roman R   Nguyen Minh Kien MK   Lowack Nick N   Heider Dominik D  

Bioinformatics (Oxford, England) 20230501 5


<h4>Motivation</h4>Identifying organellar DNA, such as mitochondrial or plastid sequences, inside a whole genome assembly, remains challenging and requires biological background knowledge. To address this, we developed ODNA based on genome annotation and machine learning to fulfill.<h4>Results</h4>ODNA is a software that classifies organellar DNA sequences within a genome assembly by machine learning based on a predefined genome annotation workflow. We trained our model with 829 769 DNA sequence  ...[more]

Similar Datasets

2023-06-01 | GSE193400 | GEO
| S-EPMC8268518 | biostudies-literature
2020-06-04 | GSE139635 | GEO
| S-EPMC9496475 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
2022-09-14 | E-MTAB-11607 | biostudies-arrayexpress
| S-EPMC5737085 | biostudies-literature
| PRJNA796028 | ENA
| S-EPMC11290435 | biostudies-literature
| S-EPMC6842143 | biostudies-literature