Unknown

Dataset Information

0

Haplotype-aware graph indexes.


ABSTRACT:

Motivation

The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes.

Results

We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.

Availability and implementation

Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Siren J 

PROVIDER: S-EPMC7223266 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Haplotype-aware graph indexes.

Sirén Jouni J   Garrison Erik E   Novak Adam M AM   Paten Benedict B   Durbin Richard R  

Bioinformatics (Oxford, England) 20200101 2


<h4>Motivation</h4>The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes.<h4>Results</h4>We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalab  ...[more]

Similar Datasets

| S-EPMC5870570 | biostudies-literature
| S-EPMC6547545 | biostudies-literature
| S-EPMC9170855 | biostudies-literature
| S-EPMC7066762 | biostudies-literature
| S-EPMC9104700 | biostudies-literature
| S-EPMC10612404 | biostudies-literature
| S-EPMC8519448 | biostudies-literature
| S-EPMC10274712 | biostudies-literature
| S-EPMC9750103 | biostudies-literature
| S-EPMC8092372 | biostudies-literature