Unknown

Dataset Information

0

Benchmarking challenging small variants with linked and long reads.


ABSTRACT: Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.

SUBMITTER: Wagner J 

PROVIDER: S-EPMC9706577 | biostudies-literature | 2022 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Benchmarking challenging small variants with linked and long reads.

Wagner Justin J   Olson Nathan D ND   Harris Lindsay L   Khan Ziad Z   Farek Jesse J   Mahmoud Medhat M   Stankovic Ana A   Kovacevic Vladimir V   Yoo Byunggil B   Miller Neil N   Rosenfeld Jeffrey A JA   Ni Bohan B   Zarate Samantha S   Kirsche Melanie M   Aganezov Sergey S   Schatz Michael C MC   Narzisi Giuseppe G   Byrska-Bishop Marta M   Clarke Wayne W   Evani Uday S US   Markello Charles C   Shafin Kishwar K   Zhou Xin X   Sidow Arend A   Bansal Vikas V   Ebert Peter P   Marschall Tobias T   Lansdorp Peter P   Hanlon Vincent V   Mattsson Carl-Adam CA   Barrio Alvaro Martinez AM   Fiddes Ian T IT   Xiao Chunlin C   Fungtammasan Arkarachai A   Chin Chen-Shan CS   Wenger Aaron M AM   Rowell William J WJ   Sedlazeck Fritz J FJ   Carroll Andrew A   Salit Marc M   Zook Justin M JM  

Cell genomics 20220501 5


Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not  ...[more]

Similar Datasets

| S-EPMC7083898 | biostudies-literature
| S-EPMC10045170 | biostudies-literature
| S-EPMC7083023 | biostudies-literature
| S-EPMC7302376 | biostudies-literature
| S-EPMC10682169 | biostudies-literature
| S-EPMC5990454 | biostudies-literature
| S-EPMC6881392 | biostudies-literature
| S-EPMC11320709 | biostudies-literature
| S-EPMC10703688 | biostudies-literature
| S-EPMC10783151 | biostudies-literature