Unknown

Dataset Information

0

Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data.


ABSTRACT: Next-generation sequencing (NGS) technologies have provided great opportunities to analyze pathogenic microbes with high-resolution data. The main goal is to accurately detect microbial composition and abundances in a sample. However, high similarity among sequences from different species and the existence of sequencing errors pose various challenges. Numerous methods have been developed for quantifying microbial composition and abundance, but they are not versatile enough for the analysis of samples with mixtures of noise. In this paper, we propose a new computational method, PGMicroD, for the detection of pathogenic microbial composition in a sample using NGS data. The method first filters the potentially mistakenly mapped reads and extracts multiple species-related features from the sequencing reads of 16S rRNA. Then it trains an Support Vector Machine classifier to predict the microbial composition. Finally, it groups all multiple-mapped sequencing reads into the references of the predicted species to estimate the abundance for each kind of species. The performance of PGMicroD is evaluated based on both simulation and real sequencing data and is compared with several existing methods. The results demonstrate that our proposed method achieves superior performance. The software package of PGMicroD is available at https://github.com/BDanalysis/PGMicroD.

SUBMITTER: Zhao H 

PROVIDER: S-EPMC7734255 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data.

Zhao Haiyong H   Wang Shuang S   Yuan Xiguo X  

Frontiers in genetics 20201130


Next-generation sequencing (NGS) technologies have provided great opportunities to analyze pathogenic microbes with high-resolution data. The main goal is to accurately detect microbial composition and abundances in a sample. However, high similarity among sequences from different species and the existence of sequencing errors pose various challenges. Numerous methods have been developed for quantifying microbial composition and abundance, but they are not versatile enough for the analysis of sa  ...[more]

Similar Datasets

| S-EPMC7527540 | biostudies-literature
| S-EPMC5522202 | biostudies-other
| S-EPMC3371040 | biostudies-literature
| S-EPMC6752280 | biostudies-literature
| S-EPMC4021345 | biostudies-literature
| S-EPMC3285633 | biostudies-literature
| S-EPMC3212839 | biostudies-other
| S-EPMC5217656 | biostudies-literature
| S-EPMC8138798 | biostudies-literature
| S-EPMC9605557 | biostudies-literature