Dataset Information

VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

ABSTRACT: A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.

SUBMITTER: Jia P

PROVIDER: S-EPMC3916227 | biostudies-literature | 2014 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Jia Peilin P Zhao Zhongming Z

PLoS computational biology 20140206 2

A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation net ...[more]

PMID: 24516372

Similar Datasets

Project description:BACKGROUND: Pathway analysis of a set of genes represents an important area in large-scale omic data analysis. However, the application of traditional pathway enrichment methods to next-generation sequencing (NGS) data is prone to several potential biases, including genomic/genetic factors (e.g., the particular disease and gene length) and environmental factors (e.g., personal life-style and frequency and dosage of exposure to mutagens). Therefore, novel methods are urgently needed for these new data types, especially for individual-specific genome data. METHODOLOGY: In this study, we proposed a novel method for the pathway analysis of NGS mutation data by explicitly taking into account the gene-wise mutation rate. We estimated the gene-wise mutation rate based on the individual-specific background mutation rate along with the gene length. Taking the mutation rate as a weight for each gene, our weighted resampling strategy builds the null distribution for each pathway while matching the gene length patterns. The empirical P value obtained then provides an adjusted statistical evaluation. PRINCIPAL FINDINGS/CONCLUSIONS: We demonstrated our weighted resampling method to a lung adenocarcinomas dataset and a glioblastoma dataset, and compared it to other widely applied methods. By explicitly adjusting gene-length, the weighted resampling method performs as well as the standard methods for significant pathways with strong evidence. Importantly, our method could effectively reject many marginally significant pathways detected by standard methods, including several long-gene-based, cancer-unrelated pathways. We further demonstrated that by reducing such biases, pathway crosstalk for each individual and pathway co-mutation map across multiple individuals can be objectively explored and evaluated. This method performs pathway analysis in a sample-centered fashion, and provides an alternative way for accurate analysis of cancer-personalized genomes. It can be extended to other types of genomic data (genotyping and methylation) that have similar bias problems.

Project description:Accurate normalization of the gene expression assays, using housekeeping genes (HKGs), is critically necessary. To do so, selection of a proper set of HKGs for a specific experiment is of great importance. Despite many studies, there is no consensus about the suitable set of HKGs for implementing in the quantitative real-time PCR analyses of chicken tissues. A limited number of HKGs have been widely used. However, wide utilization of a little number of HKGs for all tissues is challenging. The emergence of high-throughput gene expression RNA-seq data has enabled the simultaneous comparison of the stability of multiple HKGs. Therefore, employing the average coefficient of variations of at least three datasets per tissue, we sorted all reliably expressed genes (REGs; with FPKM ≥ 1 in at least one sample) and introduced the top 10 most suitable and stable reference genes for each of the 16 chicken tissues. We evaluated the consistency of the results of five tissues using the same methodology on other datasets. Furthermore, we assessed 96 previously widely used HKGs (WU-HKGs) in order to challenge the accuracy of the previous studies. The New Tuxedo software suite was used for the main analyses. The results revealed novel, different sets of reference genes for each of the tissues with 17 common genes among the top 10 genes lists of 16 tissues. The results did disprove the suitability of WU-HKGs such as Actb, Ldha, Scd, B2m, and Hprt1 for any of the tissues examined. On the contrary, a total of 6, 13, 14, 23, and 32 validated housekeeping genes (V-HKGs) were discovered as the most stable and suitable reference genes for muscle, spleen, liver, heart, and kidney tissues, respectively. Although we identified a few new HKGs usable for multiple tissues, the selection of suitable HKGs is required to be tissue specific. The newly introduced reference genes from the present study, despite lacking experimental validation, will be able to contribute to the more accurate normalization for future expression analysis of chicken genes.

Dataset Information

VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Publications

VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets