Unknown

Dataset Information

0

SPAligner: alignment of long diverged molecular sequences to assembly graphs.


ABSTRACT:

Background

Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts - in particular the analysis of metagenomic sequencing data.

Results

In this work we present a novel SPAligner (Saint-Petersburg Aligner) tool for aligning long diverged nucleotide and amino acid sequences to assembly graphs. We demonstrate that SPAligner is an efficient solution for mapping third generation sequencing reads onto assembly graphs of various complexity and also show how it can facilitate the identification of known genes in complex metagenomic datasets.

Conclusions

Our work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on Github.

SUBMITTER: Dvorkina T 

PROVIDER: S-EPMC7379835 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

SPAligner: alignment of long diverged molecular sequences to assembly graphs.

Dvorkina Tatiana T   Antipov Dmitry D   Korobeynikov Anton A   Nurk Sergey S  

BMC bioinformatics 20200724 Suppl 12


<h4>Background</h4>Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their defic  ...[more]

Similar Datasets

| S-EPMC6330207 | biostudies-literature
| S-EPMC6612831 | biostudies-literature
| S-EPMC5206522 | biostudies-literature
| S-EPMC7859483 | biostudies-literature
| S-EPMC516071 | biostudies-literature
| S-EPMC5836832 | biostudies-other
| S-EPMC7804095 | biostudies-literature
| S-EPMC5799215 | biostudies-literature
| S-EPMC1995545 | biostudies-literature
| S-EPMC6122196 | biostudies-literature