Dataset Information

Identification and analysis of co-occurrence networks with NetCutter.

ABSTRACT: BACKGROUND: Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. The methodologies and statistical models used to evaluate the significance of association between co-occurring entities are quite diverse, however. METHODOLOGY/PRINCIPAL FINDINGS: We present a general framework for co-occurrence analysis based on a bipartite graph representation of the data, a novel co-occurrence statistic, and software performing co-occurrence analysis as well as generation and analysis of co-occurrence networks. We show that the overall stringency of co-occurrence analysis depends critically on the choice of the null-model used to evaluate the significance of co-occurrence and find that random sampling from a complete permutation set of the bipartite graph permits co-occurrence analysis with optimal stringency. We show that the Poisson-binomial distribution is the most natural co-occurrence probability distribution when vertex degrees of the bipartite graph are variable, which is usually the case. Calculation of Poisson-binomial P-values is difficult, however. Therefore, we propose a fast bi-binomial approximation for calculation of P-values and show that this statistic is superior to other measures of association such as the Jaccard coefficient and the uncertainty coefficient. Furthermore, co-occurrence analysis of more than two entities can be performed using the same statistical model, which leads to increased signal-to-noise ratios, robustness towards noise, and the identification of implicit relationships between co-occurring entities. Using NetCutter, we identify a novel protein biosynthesis related set of genes that are frequently coordinately deregulated in human cancer related gene expression studies. NetCutter is available at http://bio.ifom-ieo-campus.it/NetCutter/). CONCLUSION: Our approach can be applied to any set of categorical data where co-occurrence analysis might reveal functional relationships such as clinical parameters associated with cancer subtypes or SNPs associated with disease phenotypes. The stringency of our approach is expected to offer an advantage in a variety of applications.

SUBMITTER: Muller H

PROVIDER: S-EPMC2526157 | biostudies-literature | 2008

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Identification and analysis of co-occurrence networks with NetCutter.

Müller Heiko H Mancuso Francesco F

PloS one 20080910 9

<h4>Background</h4>Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. The methodologies and statistical models used to evaluate the significance of association between co-occurring entities are quite diverse, however.<h4>Methodology/principal findings</h4>We present a general framework for co-occurrence analysis based on a bipartite graph representation of the data, a novel co-occurrence statistic, and software performing co-occurrenc ...[more]

PMID: 18781200

Similar Datasets

Project description:Neuromuscular disorders (NMDs) represent an important subset of rare diseases associated with elevated morbidity and mortality whose diagnosis can take years. Here we present a novel approach using systems biology to produce functionally-coherent phenotype clusters that provide insight into the cellular functions and phenotypic patterns underlying NMDs, using the Human Phenotype Ontology as a common framework. Gene and phenotype information was obtained for 424 NMDs in OMIM and 126 NMDs in Orphanet, and 335 and 216 phenotypes were identified as typical for NMDs, respectively. 'Elevated serum creatine kinase' was the most specific to NMDs, in agreement with the clinical test of elevated serum creatinine kinase that is conducted on NMD patients. The approach to obtain co-occurring NMD phenotypes was validated based on co-mention in PubMed abstracts. A total of 231 (OMIM) and 150 (Orphanet) clusters of highly connected co-occurrent NMD phenotypes were obtained. In parallel, a tripartite network based on phenotypes, diseases and genes was used to associate NMD phenotypes with functions, an approach also validated by literature co-mention, with KEGG pathways showing proportionally higher overlap than Gene Ontology and Reactome. Phenotype-function pairs were crossed with the co-occurrent NMD phenotype clusters to obtain 40 (OMIM) and 72 (Orphanet) functionally coherent phenotype clusters. As expected, many of these overlapped with known diseases and confirmed existing knowledge. Other clusters revealed interesting new findings, indicating informative phenotypes for differential diagnosis, providing deeper knowledge of NMDs, and pointing towards specific cell dysfunction caused by pleiotropic genes. This work is an example of reproducible research that i) can help better understand NMDs and support their diagnosis by providing a new tool that exploits existing information to obtain novel clusters of functionally-related phenotypes, and ii) takes us another step towards personalised medicine for NMDs.

Dataset Information

Identification and analysis of co-occurrence networks with NetCutter.

Publications

Identification and analysis of co-occurrence networks with NetCutter.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets