Dataset Information

Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

ABSTRACT: Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ?4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites.

SUBMITTER: Smith TC

PROVIDER: S-EPMC5036107 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

Smith Thomas C A TC Carr Antony M AM Eyre-Walker Adam C AC

PeerJ 20160920

Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites hav ...[more]

PMID: 27688957

Dataset Information

Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

Publications

Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes.
| S-EPMC5753232 | biostudies-literature

Multiplex padlock targeted sequencing reveals human hypermutable CpG variations.
| S-EPMC2752131 | biostudies-literature

Sequencing abasic sites in DNA at single-nucleotide resolution.
| S-EPMC6589398 | biostudies-literature

Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies.
| S-EPMC3961190 | biostudies-literature

Misannotation of multiple-nucleotide variants risks misdiagnosis.
| S-EPMC6957021 | biostudies-literature

Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data.
| S-EPMC4602202 | biostudies-literature

Precise detection of de novo single nucleotide variants in human genomes.
| S-EPMC6003530 | biostudies-literature

Mapping Causal Variants with Single-Nucleotide Resolution Reveals Biochemical Drivers of Phenotypic Change.
| S-EPMC5788306 | biostudies-literature

Simultaneous rapid sequencing of multiple RNA virus genomes.
| S-EPMC7119728 | biostudies-literature

SNVHMM: predicting single nucleotide variants from next generation sequencing.
| S-EPMC3718670 | biostudies-literature