Unknown

Dataset Information

0

Geck: trio-based comparative benchmarking of variant calls.


ABSTRACT: Motivation:Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations. Results:We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10-3 uncertainty. Availability and implementation:The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3. Supplementary information:Supplementary data are available at Bioinformatics online.

SUBMITTER: Komar P 

PROVIDER: S-EPMC6184596 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

geck: trio-based comparative benchmarking of variant calls.

Kómár Péter P   Kural Deniz D  

Bioinformatics (Oxford, England) 20181001 20


<h4>Motivation</h4>Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations.<h4>Results</h4>We int  ...[more]

Similar Datasets

| S-EPMC10710436 | biostudies-literature
| S-EPMC6699627 | biostudies-literature
| S-EPMC6500473 | biostudies-literature
| S-EPMC9294411 | biostudies-literature
| S-EPMC8317106 | biostudies-literature
| S-EPMC6479422 | biostudies-literature
| S-EPMC10045170 | biostudies-literature
| S-EPMC7791862 | biostudies-literature
2018-09-09 | GSE119684 | GEO
| S-EPMC8805713 | biostudies-literature