Dataset Information

NGS read classification using AI.

ABSTRACT: Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen's genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.

SUBMITTER: Voigt B

PROVIDER: S-EPMC8694450 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

NGS read classification using AI.

Voigt Benjamin B Fischer Oliver O Krumnow Christian C Herta Christian C Dabrowski Piotr Wojciech PW

PloS one 20211222 12

Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparis ...[more]

PMID: 34936673

Dataset Information

NGS read classification using AI.

Publications

NGS read classification using AI.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Label-free melanoma phenotype classification using AI-based morphological profiling
2025-07-25 | GSE273247 | GEO

Fault Classification for Cooling System of Hydraulic Machinery Using AI.
| S-EPMC10459304 | biostudies-literature

MetaTransformer: deep metagenomic sequencing read classification using self-attention models.
| S-EPMC10495543 | biostudies-literature

Rapid Real-time Squiggle Classification for Read until using RawMap.
| S-EPMC10022530 | biostudies-literature

Detection of copy-number variations from NGS data using read depth information: a diagnostic performance evaluation.
| S-EPMC7852510 | biostudies-literature

AI-driven eyelid tumor classification in ocular oncology using proteomic data.
| S-EPMC11666576 | biostudies-literature

Kart: a divide-and-conquer algorithm for NGS read alignment.
| S-EPMC5860120 | biostudies-literature

Label-free melanoma phenotype classification using AI-based morphological profiling
| PRJNA1140860 | ENA

ReCo: automated NGS read-counting of single and combinatorial CRISPR gRNAs.
| S-EPMC10400375 | biostudies-literature

A Novel Method to Detect Bias in Short Read NGS Data.
| S-EPMC6042817 | biostudies-literature