Unknown

Dataset Information

0

Alignment-free visualization of metagenomic data by nonlinear dimension reduction.


ABSTRACT: The visualization of metagenomic data, especially without prior taxonomic identification of reconstructed genomic fragments, is a challenging problem in computational biology. An ideal visualization method should, among others, enable clear distinction of congruent groups of sequences of closely related taxa, be applicable to fragments of lengths typically achievable following assembly, and allow the efficient analysis of the growing amounts of community genomic sequence data. Here, we report a scalable approach for the visualization of metagenomic data that is based on nonlinear dimension reduction via Barnes-Hut Stochastic Neighbor Embedding of centered log-ratio transformed oligonucleotide signatures extracted from assembled genomic sequence fragments. The approach allows for alignment-free assessment of the data-inherent taxonomic structure, and it can potentially facilitate the downstream binning of genomic fragments into uniform clusters reflecting organismal origin. We demonstrate the performance of our approach by visualizing community genomic sequence data from simulated as well as groundwater, human-derived and marine microbial communities.

SUBMITTER: Laczny CC 

PROVIDER: S-EPMC3970189 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Alignment-free visualization of metagenomic data by nonlinear dimension reduction.

Laczny Cedric C CC   Pinel Nicolás N   Vlassis Nikos N   Wilmes Paul P  

Scientific reports 20140331


The visualization of metagenomic data, especially without prior taxonomic identification of reconstructed genomic fragments, is a challenging problem in computational biology. An ideal visualization method should, among others, enable clear distinction of congruent groups of sequences of closely related taxa, be applicable to fragments of lengths typically achievable following assembly, and allow the efficient analysis of the growing amounts of community genomic sequence data. Here, we report a  ...[more]

Similar Datasets

| S-EPMC9296444 | biostudies-literature
| S-EPMC8236193 | biostudies-literature
| S-EPMC10647110 | biostudies-literature
| S-EPMC9867729 | biostudies-literature
| S-EPMC6364131 | biostudies-literature
| S-EPMC2998530 | biostudies-literature
| S-EPMC8677486 | biostudies-literature
| S-EPMC10716826 | biostudies-literature
| S-EPMC9993663 | biostudies-literature
| S-EPMC2935424 | biostudies-literature