Dataset Information

Best practices for benchmarking germline small-variant calls in human genomes.

ABSTRACT: Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.

SUBMITTER: Krusche P

PROVIDER: S-EPMC6699627 | biostudies-literature | 2019 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Best practices for benchmarking germline small-variant calls in human genomes.

Krusche Peter P Trigg Len L Boutros Paul C PC Mason Christopher E CE De La Vega Francisco M FM Moore Benjamin L BL Gonzalez-Porta Mar M Eberle Michael A MA Tezak Zivana Z Lababidi Samir S Truty Rebecca R Asimenos George G Funke Birgit B Fleharty Mark M Chapman Brad A BA Salit Marc M Zook Justin M JM

Nature biotechnology 20190311 5

Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify ...[more]

PMID: 30858580

Dataset Information

Best practices for benchmarking germline small-variant calls in human genomes.

Publications

Best practices for benchmarking germline small-variant calls in human genomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

vcfdist: accurately benchmarking phased small variant calls in human genomes.
| S-EPMC10710436 | biostudies-literature

An open resource for accurately benchmarking small variant and reference calls.
| S-EPMC6500473 | biostudies-literature

geck: trio-based comparative benchmarking of variant calls.
| S-EPMC6184596 | biostudies-literature

Benchmarking and Best Practices for Quantitative Proteomics
2020-04-06 | MSV000085239 | MassIVE

Best practices for eCLIP experiments and analysis
2018-06-08 | GSE107768 | GEO

Benchmarking small-variant genotyping in polyploids.
| S-EPMC8805713 | biostudies-literature

Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls.
| S-EPMC5467262 | biostudies-literature

Best practices for variant calling in clinical sequencing.
| S-EPMC7586657 | biostudies-literature

Best practices for eCLIP experiments and analysis [poor quality]
2018-06-08 | GSE107767 | GEO

Best practices for eCLIP experiments and analysis [uncertain quality]
2018-06-08 | GSE107766 | GEO