Dataset Information

Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach.

ABSTRACT: Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.

SUBMITTER: Zhao J

PROVIDER: S-EPMC3166320 | biostudies-literature | 2011

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach.

Zhao Jing J Yang Ting-Hong TH Huang Yongxu Y Holme Petter P

PloS one 20110902 9

Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same ge ...[more]

PMID: 21912686

Similar Datasets

Project description:Multi-morbidity, the health state of having two or more concurrent chronic conditions, is becoming more common as populations age, but is poorly understood. Identifying and understanding commonly occurring sets of diseases is important to inform clinical decisions to improve patient services and outcomes. Network analysis has been previously used to investigate multi-morbidity, but a classic application only allows for information on binary sets of diseases to contribute to the graph. We propose the use of hypergraphs, which allows for the incorporation of data on people with any number of conditions, and also allows us to obtain a quantitative understanding of the centrality, a measure of how well connected items in the network are to each other, of both single diseases and sets of conditions. Using this framework we illustrate its application with the set of conditions described in the Charlson morbidity index using data extracted from routinely collected population-scale, patient level electronic health records (EHR) for a cohort of adults in Wales, UK. Stroke and diabetes were found to be the most central single conditions. Sets of diseases featuring diabetes; diabetes with Chronic Pulmonary Disease, Renal Disease, Congestive Heart Failure and Cancer were the most central pairs of diseases. We investigated the differences between results obtained from the hypergraph and a classic binary graph and found that the centrality of diseases such as paraplegia, which are connected strongly to a single other disease is exaggerated in binary graphs compared to hypergraphs. The measure of centrality is derived from the weighting metrics calculated for disease sets and further investigation is needed to better understand the effect of the metric used in identifying the clinical significance and ranked centrality of grouped diseases. These initial results indicate that hypergraphs can be used as a valuable tool for analysing previously poorly understood relationships and information available in EHR data.

Dataset Information

Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach.

Publications

Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets