Unknown

Dataset Information

0

A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments.


ABSTRACT: Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of reconstructing high-quality bins that are outside the reference dataset. Using simulated and real microbiome datasets from several different habitats from GMGCv1 (Global Microbial Gene Catalog), including the human gut, non-human guts, and environmental habitats (ocean and soil), we show that SemiBin outperforms existing state-of-the-art binning methods. In particular, compared to other methods, SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species.

SUBMITTER: Pan S 

PROVIDER: S-EPMC9051138 | biostudies-literature | 2022 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments.

Pan Shaojun S   Zhu Chengkai C   Zhao Xing-Ming XM   Coelho Luis Pedro LP  

Nature communications 20220428 1


Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a s  ...[more]

Similar Datasets

| S-EPMC4699468 | biostudies-literature
| S-EPMC9238374 | biostudies-literature
| S-EPMC9387265 | biostudies-literature
| S-EPMC11336123 | biostudies-literature
| S-EPMC8315174 | biostudies-literature
| S-EPMC9022542 | biostudies-literature
| S-EPMC6785717 | biostudies-literature
| S-EPMC7889623 | biostudies-literature
| S-EPMC9763041 | biostudies-literature
| S-EPMC9446638 | biostudies-literature