Unknown

Dataset Information

0

Hierarchical sets: analyzing pangenome structure through scalable set visualizations.


ABSTRACT:

Motivation

The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in available information to increase in knowledge. As the pangenome data structure is essentially a collection of sets we explore the potential for scalable set visualization as a tool for pangenome analysis.

Results

We present a new hierarchical clustering algorithm based on set arithmetics that optimizes the intersection sizes along the branches. The intersection and union sizes along the hierarchy are visualized using a composite dendrogram and icicle plot, which, in pangenome context, shows the evolution of pangenome and core size along the evolutionary hierarchy. Outlying elements, i.e. elements whose presence pattern do not correspond with the hierarchy, can be visualized using hierarchical edge bundles. When applied to pangenome data this plot shows putative horizontal gene transfers between the genomes and can highlight relationships between genomes that is not represented by the hierarchy. We illustrate the utility of hierarchical sets by applying it to a pangenome based on 113 Escherichia and Shigella genomes and find it provides a powerful addition to pangenome analysis.

Availability and implementation

The described clustering algorithm and visualizations are implemented in the hierarchicalSets R package available from CRAN ( https://cran.r-project.org/web/packages/hierarchicalSets ).

Contact

thomasp85@gmail.com.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Pedersen TL 

PROVIDER: S-EPMC5447240 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Hierarchical sets: analyzing pangenome structure through scalable set visualizations.

Pedersen Thomas Lin TL  

Bioinformatics (Oxford, England) 20170601 11


<h4>Motivation</h4>The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in available information to increase in knowledge. As the pangenome data structure is essentially a collection of sets we explore the potential for scalable set visualization as a tool for pangenome  ...[more]

Similar Datasets

| S-EPMC3205944 | biostudies-literature
| S-EPMC6881525 | biostudies-literature
| S-EPMC6886512 | biostudies-literature
| S-EPMC3837814 | biostudies-other
| S-EPMC7886675 | biostudies-literature
| S-EPMC5115420 | biostudies-literature
| S-EPMC9333302 | biostudies-literature
| S-EPMC3147474 | biostudies-literature
| S-EPMC6538146 | biostudies-literature
| S-EPMC6194563 | biostudies-literature