Unknown

Dataset Information

0

Bit-parallel sequence-to-graph alignment.


ABSTRACT: MOTIVATION:Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction and variant calling with respect to a variation graph. RESULTS:We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers' bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with |V| nodes and |E| edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of O(|V|+?mw?|E|?log?w) for acyclic graphs and O(|V|+m|E|?log?w) for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared with a previous (asymptotically optimal) alignment algorithm. AVAILABILITY AND IMPLEMENTATION:https://github.com/maickrau/GraphAligner. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Rautiainen M 

PROVIDER: S-EPMC6761980 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bit-parallel sequence-to-graph alignment.

Rautiainen Mikko M   Mäkinen Veli V   Marschall Tobias T  

Bioinformatics (Oxford, England) 20191001 19


<h4>Motivation</h4>Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include gen  ...[more]

Similar Datasets

| S-EPMC4221118 | biostudies-literature
| S-EPMC8289385 | biostudies-literature
| S-EPMC7513500 | biostudies-literature
| S-EPMC3375188 | biostudies-literature
| S-EPMC3783189 | biostudies-literature
| S-EPMC2268776 | biostudies-literature
| S-EPMC31274 | biostudies-literature
| S-EPMC5136564 | biostudies-literature
| S-EPMC8734264 | biostudies-literature
| S-EPMC4959381 | biostudies-literature