Dataset Information

Assessing the reproducibility of exome copy number variations predictions.

ABSTRACT:

Background

Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; and varying capture methodology.

Methods

Four CNV tools were tested: eXome Hidden Markov Model (XHMM), Copy Number Inference From Exome Reads (CoNIFER), EXCAVATOR, and Copy Number Analysis for Targeted Resequencing (CONTRA). To examine the reproducibility, we ran the callers on four datasets, varying sample sizes of N = 10, 30, 75, 100, 300, and data with different capture methodology. We examined the false negative (FN) calls and false positive (FP) calls for potential limitations of the CNV callers. The positive predictive value (PPV) was measured by checking the CNV call concordance against single nucleotide polymorphism array.

Results

Using independently generated datasets, we examined the PPV for each dataset and observed wide range of PPVs. The PPV values were highly data dependent (p <0.001). For the sample sizes and capture method analyses, we tested the callers in triplicates. Both analyses resulted in wide ranges of PPVs, even for the same test. Interestingly, negative correlations between the PPV and the sample sizes were observed for CoNIFER (ρ = -0.80). Further examination of FN calls showed that 44 % of these were missed by all callers and were attributed to the CNV size (46 % spanned ≤3 exons). Overlap of the FP calls showed that FPs were unique to each caller, indicative of algorithm dependency.

Conclusions

Our results demonstrate that further improvements in CNV callers are necessary to improve reproducibility and to include wider spectrum of CNVs (including the small CNVs). These CNV callers should be evaluated on multiple independent, heterogeneously generated datasets of varying size to increase robustness and utility. These approaches to the evaluation of exome CNV are essential to support wide utility and applicability of CNV discovery in exome studies.

SUBMITTER: Hong CS

PROVIDER: S-EPMC4976506 | biostudies-literature | 2016 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessing the reproducibility of exome copy number variations predictions.

Hong Celine S CS Singh Larry N LN Mullikin James C JC Biesecker Leslie G LG

Genome medicine 20160808 1

<h4>Background</h4>Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; ...[more]

PMID: 27503473

Dataset Information

Assessing the reproducibility of exome copy number variations predictions.

Background

Methods

Results

Conclusions

Publications

Assessing the reproducibility of exome copy number variations predictions.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Genomic predictions combining SNP markers and copy number variations in Nellore cattle.
| S-EPMC5989480 | biostudies-literature

A Sparse Model Based Detection of Copy Number Variations From Exome Sequencing Data.
| S-EPMC4808620 | biostudies-literature

WISExome: a within-sample comparison approach to detect copy number variations in whole exome sequencing data.
| S-EPMC5865163 | biostudies-other

Comprehensive Analysis of Copy Number Variations in Kidney Cancer by Single-Cell Exome Sequencing.
| S-EPMC6989475 | biostudies-literature

Copy number variations and stroke.
| S-EPMC5110597 | biostudies-literature

Human subtelomeric copy number variations.
| S-EPMC2731494 | biostudies-literature

Copy number variations among silkworms.
| S-EPMC3997817 | biostudies-literature

Assessing the validity and reproducibility of genome-scale predictions.
| S-EPMC3810853 | biostudies-other

Modeling genetic inheritance of copy number variations.
| S-EPMC2588508 | biostudies-other

Decoding NF1 Intragenic Copy-Number Variations.
| S-EPMC4573439 | biostudies-literature