Unknown

Dataset Information

0

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks.


ABSTRACT: Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein-protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL's scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ?70 million nodes with ?68 billion edges in ?2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.

SUBMITTER: Azad A 

PROVIDER: S-EPMC5888241 | biostudies-literature | 2018 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks.

Azad Ariful A   Pavlopoulos Georgios A GA   Ouzounis Christos A CA   Kyrpides Nikos C NC   Buluç Aydin A  

Nucleic acids research 20180401 6


Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein-protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clusteri  ...[more]

Similar Datasets

| S-EPMC3976248 | biostudies-literature
| S-EPMC4373852 | biostudies-literature
| S-EPMC3218420 | biostudies-other
| S-EPMC2853685 | biostudies-literature
| S-EPMC5301017 | biostudies-literature
| S-EPMC6931356 | biostudies-literature
| S-EPMC8080903 | biostudies-literature
| S-EPMC4681989 | biostudies-literature
| S-EPMC3712314 | biostudies-literature
| S-EPMC3948477 | biostudies-other