Unknown

Dataset Information

0

Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data.


ABSTRACT: BACKGROUND: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification. RESULTS: In contrast to the commonly adapted data integration approach which integrates separate lists of candidate genes derived from the each single data sources, we merge various genomic networks into a multigraph which is capable of connecting multiple edges between a pair of nodes. This novel approach provides a data platform with strong noise tolerance to prioritize the disease genes. A new idea of random walk is then developed to work on multigraphs using a modified step to calculate the transition matrix. Our method is further enhanced to deal with heterogeneous data types by allowing cross-walk between phenotype and gene networks. Compared on benchmark datasets, our method is shown to be more accurate than the state-of-the-art methods in disease gene identification. We also conducted a case study to identify disease genes for Insulin-Dependent Diabetes Mellitus. Some of the newly identified disease genes are supported by recently published literature. CONCLUSIONS: The proposed RWRM (Random Walk with Restart on Multigraphs) model and CHN (Complex Heterogeneous Network) model are effective in data integration for candidate gene prioritization.

SUBMITTER: Li Y 

PROVIDER: S-EPMC3521411 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data.

Li Yongjin Y   Li Jinyan J  

BMC genomics 20121213


<h4>Background</h4>High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification.<h4>Results</h4>In cont  ...[more]

Similar Datasets

| S-EPMC6967737 | biostudies-literature
| S-EPMC8384471 | biostudies-literature
| S-EPMC6036753 | biostudies-literature
| S-EPMC8417042 | biostudies-literature
| S-EPMC5589230 | biostudies-literature
| S-EPMC6385311 | biostudies-literature
| S-EPMC5472867 | biostudies-literature
| S-EPMC8729064 | biostudies-literature
| S-EPMC5308635 | biostudies-literature
| S-EPMC6832386 | biostudies-literature