Dataset Information

MixClone: a mixture model for inferring tumor subclonal populations.

ABSTRACT: Tumor genomes are often highly heterogeneous, consisting of genomes from multiple subclonal types. Complete characterization of all subclonal types is a fundamental need in tumor genome analysis. With the advancement of next-generation sequencing, computational methods have recently been developed to infer tumor subclonal populations directly from cancer genome sequencing data. Most of these methods are based on sequence information from somatic point mutations, However, the accuracy of these algorithms depends crucially on the quality of the somatic mutations returned by variant calling algorithms, and usually requires a deep coverage to achieve a reasonable level of accuracy.We describe a novel probabilistic mixture model, MixClone, for inferring the cellular prevalences of subclonal populations directly from whole genome sequencing of paired normal-tumor samples. MixClone integrates sequence information of somatic copy number alterations and allele frequencies within a unified probabilistic framework. We demonstrate the utility of the method using both simulated and real cancer sequencing datasets, and show that it significantly outperforms existing methods for inferring tumor subclonal populations. The MixClone package is written in Python and is publicly available at https://github.com/uci-cbcl/MixClone.The probabilistic mixture model proposed here provides a new framework for subclonal analysis based on cancer genome sequencing data. By applying the method to both simulated and real cancer sequencing data, we show that integrating sequence information from both somatic copy number alterations and allele frequencies can significantly improve the accuracy of inferring tumor subclonal populations.

SUBMITTER: Li Y

PROVIDER: S-EPMC4331709 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MixClone: a mixture model for inferring tumor subclonal populations.

Li Yi Y Xie Xiaohui X

BMC genomics 20150121

<h4>Background</h4>Tumor genomes are often highly heterogeneous, consisting of genomes from multiple subclonal types. Complete characterization of all subclonal types is a fundamental need in tumor genome analysis. With the advancement of next-generation sequencing, computational methods have recently been developed to infer tumor subclonal populations directly from cancer genome sequencing data. Most of these methods are based on sequence information from somatic point mutations, However, the a ...[more]

PMID: 25707430

Similar Datasets

Project description:Multistage tumorigenesis is a dynamic process characterized by the accumulation of mutations. Thus, a tumor mass is composed of genetically divergent cell subclones. With the advancement of next-generation sequencing (NGS), mathematical models have been recently developed to decompose tumor subclonal architecture from a collective genome sequencing data. Most of the methods focused on single-nucleotide variants (SNVs). However, somatic copy number aberrations (CNAs) also play critical roles in carcinogenesis. Therefore, further modeling subclonal CNAs composition would hold the promise to improve the analysis of tumor heterogeneity and cancer evolution. To address this issue, we developed a two-way mixture Poisson model, named CloneDeMix for the deconvolution of read-depth information. It can infer the subclonal copy number, mutational cellular prevalence (MCP), subclone composition, and the order in which mutations occurred in the evolutionary hierarchy. The performance of CloneDeMix was systematically assessed in simulations. As a result, the accuracy of CNA inference was nearly 93% and the MCP was also accurately restored. Furthermore, we also demonstrated its applicability using head and neck cancer samples from TCGA. Our results inform about the extent of subclonal CNA diversity, and a group of candidate genes that probably initiate lymph node metastasis during tumor evolution was also discovered. Most importantly, these driver genes are located at 11q13.3 which is highly susceptible to copy number change in head and neck cancer genomes. This study successfully estimates subclonal CNAs and exhibit the evolutionary relationships of mutation events. By doing so, we can track tumor heterogeneity and identify crucial mutations during evolution process. Hence, it facilitates not only understanding the cancer development but finding potential therapeutic targets. Briefly, this framework has implications for improved modeling of tumor evolution and the importance of inclusion of subclonal CNAs.

Dataset Information

MixClone: a mixture model for inferring tumor subclonal populations.

Publications

MixClone: a mixture model for inferring tumor subclonal populations.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets