Dataset Information

Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks.

ABSTRACT: Biological networks catalog the complex web of interactions happening between different molecules, typically proteins, within a cell. These networks are known to be highly modular, with groups of proteins associated with specific biological functions. Human diseases often arise from the dysfunction of one or more such proteins of the biological functional group. The ability, to identify and automatically extract these modules has implications for understanding the etiology of different diseases as well as the functional roles of different protein modules in disease. The recent DREAM challenge posed the problem of identifying disease modules from six heterogeneous networks of proteins/genes. There exist many community detection algorithms, but all of them are not adaptable to the biological context, as these networks are densely connected and the size of biologically relevant modules is quite small. The contribution of this study is 3-fold: first, we present a comprehensive assessment of many classic community detection algorithms for biological networks to identify non-overlapping communities, and propose heuristics to identify small and structurally well-defined communities-core modules. We evaluated our performance over 180 GWAS datasets. In comparison to traditional approaches, with our proposed approach we could identify 50% more number of disease-relevant modules. Thus, we show that it is important to identify more compact modules for better performance. Next, we sought to understand the peculiar characteristics of disease-enriched modules and what causes standard community detection algorithms to detect so few of them. We performed a comprehensive analysis of the interaction patterns of known disease genes to understand the structure of disease modules and show that merely considering the known disease genes set as a module does not give good quality clusters, as measured by typical metrics such as modularity and conductance. We go on to present a methodology leveraging these known disease genes, to also include the neighboring nodes of these genes into a module, to form good quality clusters and subsequently extract a "gold-standard set" of disease modules. Lastly, we demonstrate, with justification, that "overlapping" community detection algorithms should be the preferred choice for disease module identification since several genes participate in multiple biological functions.

SUBMITTER: Tripathi B

PROVIDER: S-EPMC6424898 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks.

Tripathi Beethika B Parthasarathy Srinivasan S Sinha Himanshu H Raman Karthik K Ravindran Balaraman B

Frontiers in genetics 20190313

Biological networks catalog the complex web of interactions happening between different molecules, typically proteins, within a cell. These networks are known to be highly modular, with groups of proteins associated with specific biological functions. Human diseases often arise from the dysfunction of one or more such proteins of the biological functional group. The ability, to identify and automatically extract these modules has implications for understanding the etiology of different diseases ...[more]

PMID: 30918511

Similar Datasets

Project description:BackgroundCommunity detection algorithms are fundamental tools to uncover important features in networks. There are several studies focused on social networks but only a few deal with biological networks. Directly or indirectly, most of the methods maximize modularity, a measure of the density of links within communities as compared to links between communities.ResultsHere we analyze six different community detection algorithms, namely, Combo, Conclude, Fast Greedy, Leading Eigen, Louvain and Spinglass, on two important biological networks to find their communities and evaluate the results in terms of topological and functional features through Kyoto Encyclopedia of Genes and Genomes pathway and Gene Ontology term enrichment analysis. At a high level, the main assessment criteria are 1) appropriate community size (neither too small nor too large), 2) representation within the community of only one or two broad biological functions, 3) most genes from the network belonging to a pathway should also belong to only one or two communities, and 4) performance speed. The first network in this study is a network of Protein-Protein Interactions (PPI) in Saccharomyces cerevisiae (Yeast) with 6532 nodes and 229,696 edges and the second is a network of PPI in Homo sapiens (Human) with 20,644 nodes and 241,008 edges. All six methods perform well, i.e., find reasonably sized and biologically interpretable communities, for the Yeast PPI network but the Conclude method does not find reasonably sized communities for the Human PPI network. Louvain method maximizes modularity by using an agglomerative approach, and is the fastest method for community detection. For the Yeast PPI network, the results of Spinglass method are most similar to the results of Louvain method with regard to the size of communities and core pathways they identify, whereas for the Human PPI network, Combo and Spinglass methods yield the most similar results, with Louvain being the next closest.ConclusionsFor Yeast and Human PPI networks, Louvain method is likely the best method to find communities in terms of detecting known core pathways in a reasonable time.

Project description:Modular organization is an emergent property of brain networks, responsible for shaping communication processes and underpinning brain functioning. Moreover, brain networks are intrinsically multilayer since their attributes can vary across time, subjects, frequency, or other domains. Identifying the modular structure in multilayer brain networks represents a gateway toward a deeper understanding of neural processes underlying cognition. Electroencephalographic (EEG) signals, thanks to their high temporal resolution, can give rise to multilayer networks able to follow the dynamics of brain activity. Despite this potential, the community organization has not yet been thoroughly investigated in brain networks estimated from EEG. Furthermore, at the state of the art, there is still no agreement about which algorithm is the most suitable to detect communities in multilayer brain networks, and a way to test and compare them all under a variety of conditions is lacking. In this work, we perform a comprehensive analysis of three algorithms at the state of the art for multilayer community detection (namely, genLouvain, DynMoga, and FacetNet) as compared with an approach based on the application of a single-layer clustering algorithm to each slice of the multilayer network. We test their ability to identify both steady and dynamic modular structures. We statistically evaluate their performances by means of ad hoc benchmark graphs characterized by properties covering a broad range of conditions in terms of graph density, number of clusters, noise level, and number of layers. The results of this simulation study aim to provide guidelines about the choice of the more appropriate algorithm according to the different properties of the brain network under examination. Finally, as a proof of concept, we show an application of the algorithms to real functional brain networks derived from EEG signals collected at rest with closed and open eyes. The test on real data provided results in agreement with the conclusions of the simulation study and confirmed the feasibility of multilayer analysis of EEG-based brain networks in both steady and dynamic conditions.

Dataset Information

Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks.

Publications

Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets