Unknown

Dataset Information

0

Effective binning of metagenomic contigs using contrastive multi-view representation learning.


ABSTRACT: Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs).

SUBMITTER: Wang Z 

PROVIDER: S-EPMC10794208 | biostudies-literature | 2024 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Effective binning of metagenomic contigs using contrastive multi-view representation learning.

Wang Ziye Z   You Ronghui R   Han Haitao H   Liu Wei W   Sun Fengzhu F   Zhu Shanfeng S  

Nature communications 20240117 1


Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each con  ...[more]

Similar Datasets

| S-EPMC8883645 | biostudies-literature
| S-EPMC3514610 | biostudies-literature
| S-EPMC10130187 | biostudies-literature
| S-EPMC4828714 | biostudies-literature
| S-EPMC8982879 | biostudies-literature
| S-EPMC9044235 | biostudies-literature
| S-EPMC9154269 | biostudies-literature
| S-EPMC11532661 | biostudies-literature
| S-EPMC8296540 | biostudies-literature
| S-EPMC11373324 | biostudies-literature