Dataset Information

Seq-seq-pan: building a computational pan-genome data structure on whole genome alignment.

ABSTRACT: The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes.We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses.By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformatics.

SUBMITTER: Jandrasits C

PROVIDER: S-EPMC5769345 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

seq-seq-pan: building a computational pan-genome data structure on whole genome alignment.

Jandrasits Christine C Dabrowski Piotr W PW Fuchs Stephan S Renard Bernhard Y BY

BMC genomics 20180115 1

<h4>Background</h4>The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a com ...[more]

PMID: 29334898

Dataset Information

Seq-seq-pan: building a computational pan-genome data structure on whole genome alignment.

Publications

seq-seq-pan: building a computational pan-genome data structure on whole genome alignment.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Improving pan-genome annotation using whole genome multiple alignment.
| S-EPMC3142524 | biostudies-literature

Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage.
| S-EPMC4832552 | biostudies-literature

Pan-PCR, a computational method for designing bacterium-typing assays based on whole-genome sequence data.
| S-EPMC3592046 | biostudies-literature

Building a Chinese pan-genome of 486 individuals.
| S-EPMC8405635 | biostudies-literature

Building a pan-genome reference for a population.
| S-EPMC4424974 | biostudies-literature

Computational analysis of whole-genome differential allelic expression data in human.
| S-EPMC2900287 | biostudies-literature

iRNA-seq: computational method for genome-wide assessment of acute transcriptional regulation from total RNA-seq data.
| S-EPMC4381047 | biostudies-literature

Supervised Adversarial Alignment of Single-Cell RNA-seq Data.
| S-EPMC8418522 | biostudies-literature

Computational Methods for CLIP-seq Data Processing.
| S-EPMC4196881 | biostudies-literature

Computational analysis of bacterial RNA-Seq data.
| S-EPMC3737546 | biostudies-literature