Dataset Information

A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data.

ABSTRACT:

Motivation

Tumours are composed of distinct cancer cell populations (clones), which continuously adapt to their local micro-environment. Standard methods for clonal deconvolution seek to identify groups of mutations and estimate the prevalence of each group in the tumour, while considering its purity and copy number profile. These methods have been applied on cross-sectional data and on longitudinal data after discarding information on the timing of sample collection. Two key questions are how can we incorporate such information in our analyses and is there any benefit in doing so?

Results

We developed a clonal deconvolution method, which incorporates explicitly the temporal spacing of longitudinally sampled tumours. By merging a Dirichlet Process Mixture Model with Gaussian Process priors and using as input a sequence of several sparsely collected samples, our method can reconstruct the temporal profile of the abundance of any mutation cluster supported by the data as a continuous function of time. We benchmarked our method on whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data and we found that incorporating information on the timing of tissue collection improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. Thus, our approach is particularly useful when collecting a relatively long sequence of tumour samples is feasible, as in liquid cancers (e.g. leukaemia) and liquid biopsies.

Availability and implementation

The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Vavoulis DV

PROVIDER: S-EPMC8055230 | biostudies-literature | 2021 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data.

Vavoulis Dimitrios V DV Cutts Anthony A Taylor Jenny C JC Schuh Anna A

Bioinformatics (Oxford, England) 20210401 2

<h4>Motivation</h4>Tumours are composed of distinct cancer cell populations (clones), which continuously adapt to their local micro-environment. Standard methods for clonal deconvolution seek to identify groups of mutations and estimate the prevalence of each group in the tumour, while considering its purity and copy number profile. These methods have been applied on cross-sectional data and on longitudinal data after discarding information on the timing of sample collection. Two key questions a ...[more]

PMID: 32722772

Similar Datasets

Project description:BACKGROUND: Excretory/secretory proteins (ESPs) play a major role in parasitic infection as they are present at the host-parasite interface and regulate host immune system. In case of parasitic helminths, transcriptomics has been used extensively to understand the molecular basis of parasitism and for developing novel therapeutic strategies against parasitic infections. However, none of transcriptomic studies have extensively covered ES protein prediction for identifying novel therapeutic targets, especially as parasites adopt non-classical secretion pathways. RESULTS: We developed a semi-automated computational approach for prediction and annotation of ES proteins using transcriptomic data from next generation sequencing platforms. For the prediction of non-classically secreted proteins, we have used an improved computational strategy, together with homology matching to a dataset of experimentally determined parasitic helminth ES proteins. We applied this protocol to analyse 454 short reads of parasitic nematode, Strongyloides ratti. From 296231 reads, we derived 28901 contigs, which were translated into 20877 proteins. Based on our improved ES protein prediction pipeline, we identified 2572 ES proteins, of which 407 (1.9%) proteins have classical N-terminal signal peptides, 923 (4.4%) were computationally identified as non-classically secreted while 1516 (7.26%) were identified by homology to experimentally identified parasitic helminth ES proteins. Out of 2572 ES proteins, 2310 (89.8%) ES proteins had homologues in the free-living nematode Caenorhabditis elegans and 2220 (86.3%) in parasitic nematodes. We could functionally annotate 1591 (61.8%) ES proteins with protein families and domains and establish pathway associations for 691 (26.8%) proteins. In addition, we have identified 19 representative ES proteins, which have no homologues in the host organism but homologous to lethal RNAi phenotypes in C. elegans, as potential therapeutic targets. CONCLUSION: We report a comprehensive approach using freely available computational tools for the secretome analysis of NGS data. This approach has been applied to S. ratti 454 transcriptomic data for in silico excretory/secretory proteins prediction and analysis, providing a foundation for developing new therapeutic solutions for parasitic infections.

Project description:Background/aimsRecently, next-generation sequencing-based technologies have enabled DNA methylation profiling at high resolution and low cost. Methyl-Seq and Reduced Representation Bisulfite Sequencing (RRBS) are two such technologies that interrogate methylation levels at CpG sites throughout the entire human genome. With rapid reduction of sequencing costs, these technologies will enable epigenotyping of large cohorts for phenotypic association studies. Existing quantification methods for sequencing-based methylation profiling are simplistic and do not deal with the noise due to the random sampling nature of sequencing and various experimental artifacts. Therefore, there is a need to investigate the statistical issues related to the quantification of methylation levels for these emerging technologies, with the goal of developing an accurate quantification method.MethodsIn this paper, we propose two methods for Methyl-Seq quantification. The first method, the Maximum Likelihood estimate, is both conceptually intuitive and computationally simple. However, this estimate is biased at extreme methylation levels and does not provide variance estimation. The second method, based on bayesian hierarchical model, allows variance estimation of methylation levels, and provides a flexible framework to adjust technical bias in the sequencing process.ResultsWe compare the previously proposed binary method, the Maximum Likelihood (ML) method, and the bayesian method. In both simulation and real data analysis of Methyl-Seq data, the bayesian method offers the most accurate quantification. The ML method is slightly less accurate than the bayesian method. But both our proposed methods outperform the original binary method in Methyl-Seq. In addition, we applied these quantification methods to simulation data and show that, with sequencing depth above 40-300 (which varies with different tissue samples) per cleavage site, Methyl-Seq offers a comparable quantification consistency as microarrays.

Dataset Information

A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data.

Motivation

Results

Availability and implementation

Supplementary information

Publications

A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets