High-definition reconstruction of clonal composition in cancer.
Ontology highlight
ABSTRACT: The extensive genetic heterogeneity of cancers can greatly affect therapy success due to the existence of subclonal mutations conferring resistance. However, the characterization of subclones in mixed-cell populations is computationally challenging due to the short length of sequence reads that are generated by current sequencing technologies. Here, we report cloneHD, a probabilistic algorithm for the performance of subclone reconstruction from data generated by high-throughput DNA sequencing: read depth, B-allele counts at germline heterozygous loci, and somatic mutation counts. The algorithm can exploit the added information present in correlated longitudinal or multiregion samples and takes into account correlations along genomes caused by events such as copy-number changes. We apply cloneHD to two case studies: a breast cancer sample and time-resolved samples of chronic lymphocytic leukemia, where we demonstrate that monitoring the response of a patient to therapy regimens is feasible. Our work provides new opportunities for tracking cancer development.
Project description:MotivationDNA sequencing of multiple samples from the same tumor provides data to analyze the process of clonal evolution in the population of cells that give rise to a tumor.ResultsWe formalize the problem of reconstructing the clonal evolution of a tumor using single-nucleotide mutations as the variant allele frequency (VAF) factorization problem. We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete. We derive an integer linear programming solution to the VAF factorization problem in the case of error-free data and extend this solution to real data with a probabilistic model for errors. The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence VAFs.Availability and implementationAn implementation of AncesTree is available at: http://compbio.cs.brown.edu/software.
Project description:A central task in developmental biology is to learn the sequence of fate decisions that leads to each mature cell type in a tissue or organism. Recently, clonal labeling of cells using DNA barcodes has emerged as a powerful approach for identifying cells that share a common ancestry of fate decisions. Here we explore the idea that stochasticity of cell fate choice during tissue development could be harnessed to read out lineage relationships after a single step of clonal barcoding. By considering a generalized multitype branching process, we determine the conditions under which the final distribution of barcodes over observed cell types encodes their bona fide lineage relationships. We then propose a method for inferring the order of fate decisions. Our theory predicts a set of symmetries of barcode covariance that serves as a consistency check for the validity of the method. We show that broken symmetries may be used to detect multiple paths of differentiation to the same cell types. We provide computational tools for general use. When applied to barcoding data in hematopoiesis, these tools reconstruct the classical hematopoietic hierarchy and detect couplings between monocytes and dendritic cells and between erythrocytes and basophils that suggest multiple pathways of differentiation for these lineages.
Project description:Motivation:In cancer, clonal evolution is assessed based on information coming from single nucleotide variants and copy number alterations. Nonetheless, existing methods often fail to accurately combine information from both sources to truthfully reconstruct clonal populations in a given tumor sample or in a set of tumor samples coming from the same patient. Moreover, previously published methods detect clones from a single set of variants. As a result, compromises have to be done between stringent variant filtering [reducing dispersion in variant allele frequency estimates (VAFs)] and using all biologically relevant variants. Results:We present a framework for defining cancer clones using most reliable variants of high depth of coverage and assigning functional mutations to the detected clones. The key element of our framework is QuantumClone, a method for variant clustering into clones based on VAFs, genotypes of corresponding regions and information about tumor purity. We validated QuantumClone and our framework on simulated data. We then applied our framework to whole genome sequencing data for 19 neuroblastoma trios each including constitutional, diagnosis and relapse samples. We confirmed an enrichment of damaging variants within such pathways as MAPK (mitogen-activated protein kinases), neuritogenesis, epithelial-mesenchymal transition, cell survival and DNA repair. Most pathways had more damaging variants in the expanding clones compared to shrinking ones, which can be explained by the increased total number of variants between these two populations. Functional mutational rate varied for ancestral clones and clones shrinking or expanding upon treatment, suggesting changes in clone selection mechanisms at different time points of tumor evolution. Availability and implementation:Source code and binaries of the QuantumClone R package are freely available for download at https://CRAN.R-project.org/package=QuantumClone. Contact:gudrun.schleiermacher@curie.fr or valentina.boeva@inserm.fr. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:Cancers arise from successive rounds of mutation and selection, generating clonal populations that vary in size, mutational content and drug responsiveness. Ascertaining the clonal composition of a tumor is therefore important both for prognosis and therapy. Mutation counts and frequencies resulting from next-generation sequencing (NGS) potentially reflect a tumor's clonal composition; however, deconvolving NGS data to infer a tumor's clonal structure presents a major challenge. We propose a generative model for NGS data derived from multiple subsections of a single tumor, and we describe an expectation-maximization procedure for estimating the clonal genotypes and relative frequencies using this model. We demonstrate, via simulation, the validity of the approach, and then use our algorithm to assess the clonal composition of a primary breast cancer and associated metastatic lymph node. After dividing the tumor into subsections, we perform exome sequencing for each subsection to assess mutational content, followed by deep sequencing to precisely count normal and variant alleles within each subsection. By quantifying the frequencies of 17 somatic variants, we demonstrate that our algorithm predicts clonal relationships that are both phylogenetically and spatially plausible. Applying this method to larger numbers of tumors should cast light on the clonal evolution of cancers in space and time.
Project description:Colibacillosis in poultry is a unique disease manifestation of Escherichia coli in the animal world, as one of the primary routes of entry is via the respiratory tract of birds. Because of this, a novel extraintestinal pathogenic E. coli (ExPEC) subpathotype coined avian pathogenic E. coli (or APEC) has been described. Like other ExPEC, this pathotype has been challenging to clearly define, and in the case of APEC, its role as an opportunistic pathogen has further complicated these challenges. Using 3,479 temporally matched genomes of poultry-source isolates, we show that the APEC plasmid, previously considered a defining trait of APEC, is highly prevalent in clinical isolates from diseased turkeys. However, the plasmid is also quite prevalent among cecal E. coli isolates from healthy birds, including both turkeys and broilers. In contrast, we identify distinct differences in clonal backgrounds of turkey clinical versus cecal strains, with a subset of sequence types (STs) dominating the clinical landscape (ST23, ST117, ST131, ST355, and ST428), which are rare within the cecal landscape. Because the same clinical STs have also dominated the broiler landscape, we performed lethality assays using strains from dominant STs from clinical or cecal landscapes in embryonated turkey and chicken eggs. We show that, irrespective of plasmid carriage, dominant clinical STs are significantly more virulent than dominant cecal STs. We present a revised APEC screening tool that incorporates APEC plasmid carriage plus markers for dominant clinical STs. This revised APEC pathotyping tool improves the ability to identify high-risk APEC clones within poultry production systems, and identifies STs of interest for mitigation targets.
Project description:BackgroundOutcomes in men with National Comprehensive Cancer Network (NCCN) high-risk prostate cancer (PCa) can vary substantially-some will have excellent cancer-specific survival, whereas others will experience early metastasis even after aggressive local treatments. Current nomograms, which yield continuous risk probabilities, do not separate high-risk PCa into distinct sub-strata. Here, we derive a binary definition of very-high-risk (VHR) localized PCa to aid in risk stratification at diagnosis and selection of therapy.MethodsWe queried the Johns Hopkins radical prostatectomy database to identify 753 men with NCCN high-risk localized PCa (Gleason sum 8-10, PSA >20 ng ml(-1), or clinical stage ≥T3). Twenty-eight alternate permutations of adverse grade, stage and cancer volume were compared by their hazard ratios for metastasis and cancer-specific mortality. VHR criteria with top-ranking hazard ratios were further evaluated by multivariable analyses and inclusion of a clinically meaningful proportion of the high-risk cohort.ResultsThe VHR cohort was best defined by primary pattern 5 present on biopsy, or ≥5 cores with Gleason sum 8-10, or multiple NCCN high-risk features. These criteria encompassed 15.1% of the NCCN high-risk cohort. Compared with other high-risk men, VHR men were at significantly higher risk for metastasis (hazard ratio 2.75) and cancer-specific mortality (hazard ratio 3.44) (P<0.001 for both). Among high-risk men, VHR men also had significantly worse 10-year metastasis-free survival (37% vs 78%) and cancer-specific survival (62% vs 90%).ConclusionsMen who meet VHR criteria form a subgroup within the current NCCN high-risk classification who have particularly poor oncological outcomes. Use of these characteristics to distinguish VHR localized PCa may help in counseling and selection optimal candidates for multimodal treatments or clinical trials.
Project description:Multidrug-resistant and highly virulent Klebsiella pneumoniae isolates are emerging, but the clonal groups (CGs) corresponding to these high-risk strains have remained imprecisely defined. We aimed to identify K. pneumoniae CGs on the basis of genome-wide sequence variation and to provide a simple bioinformatics tool to extract virulence and resistance gene data from genomic data. We sequenced 48 K. pneumoniae isolates, mostly of serotypes K1 and K2, and compared the genomes with 119 publicly available genomes. A total of 694 highly conserved genes were included in a core-genome multilocus sequence typing scheme, and cluster analysis of the data enabled precise definition of globally distributed hypervirulent and multidrug-resistant CGs. In addition, we created a freely accessible database, BIGSdb-Kp, to enable rapid extraction of medically and epidemiologically relevant information from genomic sequences of K. pneumoniae. Although drug-resistant and virulent K. pneumoniae populations were largely nonoverlapping, isolates with combined virulence and resistance features were detected.
Project description:BACKGROUND:Bacterial cells during many replication cycles accumulate spontaneous mutations, which result in the birth of novel clones. As a result of this clonal expansion, an evolving bacterial population has different clonal composition over time, as revealed in the long-term evolution experiments (LTEEs). Accurately inferring the haplotypes of novel clones as well as the clonal frequencies and the clonal evolutionary history in a bacterial population is useful for the characterization of the evolutionary pressure on multiple correlated mutations instead of that on individual mutations. RESULTS:In this paper, we study the computational problem of reconstructing the haplotypes of bacterial clones from the variant allele frequencies observed from an evolving bacterial population at multiple time points. We formalize the problem using a maximum likelihood function, which is defined under the assumption that mutations occur spontaneously, and thus the likelihood of a mutation occurring in a specific clone is proportional to the frequency of the clone in the population when the mutation occurs. We develop a series of heuristic algorithms to address the maximum likelihood inference, and show through simulation experiments that the algorithms are fast and achieve near optimal accuracy that is practically plausible under the maximum likelihood framework. We also validate our method using experimental data obtained from a recent study on long-term evolution of Escherichia coli. CONCLUSION:We developed efficient algorithms to reconstruct the clonal evolution history from time course genomic sequencing data. Our algorithm can also incorporate clonal sequencing data to improve the reconstruction results when they are available. Based on the evaluation on both simulated and experimental sequencing data, our algorithms can achieve satisfactory results on the genome sequencing data from long-term evolution experiments. AVAILABILITY:The program (ClonalTREE) is available as open-source software on GitHub at https://github.com/COL-IU/ClonalTREE.
Project description:Knowledge about the clonal evolution of a tumor can help to interpret the function of its genetic alterations by identifying initiating events and events that contribute to the selective advantage of proliferative, metastatic, and drug-resistant subclones. Clonal evolution can be reconstructed from estimates of the relative abundance (frequency) of subclone-specific alterations in tumor biopsies, which, in turn, inform on its composition. However, estimating these frequencies is complicated by the high genetic instability that characterizes many cancers. Models for genetic instability suggest that copy number alterations (CNAs) can influence mutation-frequency estimates and thus impede efforts to reconstruct tumor phylogenies. Our analysis suggested that accurate mutation frequency estimates require accounting for CNAs-a challenging endeavour using the genetic profile of a single tumor biopsy. Instead, we propose an optimization algorithm, Chimæra, to account for the effects of CNAs using profiles of multiple biopsies per tumor. Analyses of simulated data and tumor profiles suggested that Chimæra estimates are consistently more accurate than those of previously proposed methods and resulted in improved phylogeny reconstructions and subclone characterizations. Our analyses inferred recurrent initiating mutations in hepatocellular carcinomas, resolved the clonal composition of Wilms' tumors, and characterized the acquisition of mutations in drug-resistant prostate cancers.
Project description:BackgroundThe high-definition standard (HD-standard) scan mode has been proven to display stents better than the standard (STND) scan mode but with more image noise. Deep learning image reconstruction (DLIR) is capable of reducing image noise. This study examined the impact of HD-standard scan mode with DLIR algorithms on stent and coronary artery image quality in coronary computed tomography angiography (CCTA) via a comparison with conventional STND scan mode and adaptive statistical iterative reconstruction-Veo (ASIR-V) algorithms.MethodsThe data of 121 patients who underwent HD-standard mode scans (group A: N=47, with coronary stent) or STND mode scans (group B: N=74, without coronary stent) were retrospectively collected. All images were reconstructed with ASIR-V at a level of 50% (ASIR-V50%) and a level of 80% (ASIR-V80%) and with DLIR at medium (DLIR-M) and high (DLIR-H) levels. The noise, signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), artifact index (AI), and in-stent diameter were measured as objective evaluation parameters. Subjective assessment involved a 5-point scale for overall image quality, image noise, stent appearance, stent artifacts, vascular sharpness, and diagnostic confidence. Diagnostic confidence was evaluated based on the presence or absence of significant stenosis (≥50% lumen reduction). Both subjective and objective evaluations were conducted by two radiologists independently, with kappa and intraclass correlation statistics being used to test the interobserver agreement.ResultsThere were 76 evaluable stents in group A, and the DLIR-H algorithm significantly outperformed other algorithms, demonstrating the lowest noise (41.6±7.1/41.3±7.2) and AI (32.4±8.9/31.2±10.1), the highest SNR (14.6±3.5/15.0±3.5) and CNR (13.6±3.8/13.9±3.8), and the largest in-stent diameter (2.18±0.61/2.19±0.61) in representing true stent diameter (all P values <0.01), as well as the highest score in each subjective evaluation parameter. In group B, a total of 296 coronary arteries were evaluated, and the DLIR-H algorithm provided the best objective image quality, with statistically superior noise, SNR, and CNR compared with the other algorithms (all P values <0.05). Moreover, the HD-standard mode scan with DLIR provided better image quality and a lower radiation dose than did the STND mode scan with ASIR-V (P<0.01).ConclusionsHD-standard scan mode with DLIR-H improves image quality of both stents and coronary arteries on CCTA under a lower radiation dose.