Unknown

Dataset Information

0

Founder Reconstruction Enables Scalable and Seamless Pangenomic Analysis.


ABSTRACT: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge. We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling. Our open access tools and instructions how to reproduce our experiments are available at the following address: https://github.com/algbio/panvc-founders. Supplementary data are available at Bioinformatics online.

SUBMITTER: Norri T 

PROVIDER: S-EPMC8665761 | biostudies-literature | 2021 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Founder reconstruction enables scalable and seamless pangenomic analysis.

Norri Tuukka T   Cazaux Bastien B   Dönges Saska S   Valenzuela Daniel D   Mäkinen Veli V  

Bioinformatics (Oxford, England) 20211201 24


<h4>Motivation</h4>Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge.<h4>Results</h4>We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of referen  ...[more]

Similar Datasets

| S-EPMC9737040 | biostudies-literature
| S-EPMC4991573 | biostudies-literature
| S-EPMC9477819 | biostudies-literature
| S-EPMC5498544 | biostudies-literature
| S-EPMC9640686 | biostudies-literature
| S-EPMC11326198 | biostudies-literature
| S-EPMC4634733 | biostudies-literature
| S-EPMC10624853 | biostudies-literature
| S-EPMC8494751 | biostudies-literature
| S-EPMC10872951 | biostudies-literature