Unknown

Dataset Information

0

Comparison of large networks with sub-sampling strategies.


ABSTRACT: Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data. Netdis is such a score that defines network similarity through the counts of small sub-graphs in the local neighbourhood of all nodes. Here, we introduce a sub-sampling procedure based on neighbourhoods which links naturally with the framework of network comparisons through local neighbourhood comparisons. Our theoretical arguments justify basing the Netdis statistic on a sample of similar-sized neighbourhoods. Our tests on empirical and synthetic datasets indicate that often only 10% of the neighbourhoods of a network suffice for optimal performance, leading to a drastic reduction in computational requirements. The sampling procedure is applicable even when only a small sample of the network is known, and thus provides a novel tool for network comparison of very large and potentially incomplete datasets.

SUBMITTER: Ali W 

PROVIDER: S-EPMC4933923 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC10148754 | biostudies-literature
| S-EPMC6376652 | biostudies-literature
| S-EPMC4306386 | biostudies-literature
| S-EPMC4448827 | biostudies-other
| S-EPMC2213691 | biostudies-literature
| S-EPMC9680323 | biostudies-literature
| S-EPMC6154981 | biostudies-literature
| S-EPMC2039832 | biostudies-other
| S-EPMC4634733 | biostudies-literature
| S-EPMC6731643 | biostudies-literature