Dataset Information

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.

ABSTRACT: BackgroundSeveral problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks.ResultsWe propose a novel semi-supervised parallel enhancement of COSNet, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method.ConclusionsBy parallelizing COSNet we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.

SUBMITTER: Frasca M

PROVIDER: S-EPMC6191976 | biostudies-other | 2018

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:It has been a challenge in systems biology to unravel relationships between structural properties and dynamic behaviors of biological networks. A Cytoscape plugin named NetDS was recently proposed to analyze the robustness-related dynamics and feed-forward/feedback loop structures of biological networks. Despite such a useful function, limitations on the network size that can be analyzed exist due to high computational costs. In addition, the plugin cannot verify an intrinsic property which can be induced by an observed result because it has no function to simulate the observation on a large number of random networks. To overcome these limitations, we have developed a novel software tool, PANET. First, the time-consuming parts of NetDS were redesigned to be processed in parallel using the OpenCL library. This approach utilizes the full computing power of multi-core central processing units and graphics processing units. Eventually, this made it possible to investigate a large-scale network such as a human signaling network with 1,609 nodes and 5,063 links. We also developed a new function to perform a batch-mode simulation where it generates a lot of random networks and conducts robustness calculations and feed-forward/feedback loop examinations of them. This helps us to determine if the findings in real biological networks are valid in arbitrary random networks or not. We tested our plugin in two case studies based on two large-scale signaling networks and found interesting results regarding relationships between coherently coupled feed-forward/feedback loops and robustness. In addition, we verified whether or not those findings are consistently conserved in random networks through batch-mode simulations. Taken together, our plugin is expected to effectively investigate various relationships between dynamics and structural properties in large-scale networks. Our software tool, user manual and example datasets are freely available at http://panet-csc.sourceforge.net/.

Dataset Information

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets