Dataset Information

MUMmer4: A fast and versatile genome alignment system.

ABSTRACT: The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.

SUBMITTER: Marcais G

PROVIDER: S-EPMC5802927 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MUMmer4: A fast and versatile genome alignment system.

Marçais Guillaume G Delcher Arthur L AL Phillippy Adam M AM Coston Rachel R Salzberg Steven L SL Zimin Aleksey A

PLoS computational biology 20180126 1

The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very l ...[more]

PMID: 29373581

Dataset Information

MUMmer4: A fast and versatile genome alignment system.

Publications

MUMmer4: A fast and versatile genome alignment system.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A distributed system for fast alignment of next-generation sequencing data.
| S-EPMC4984844 | biostudies-literature

Fast alignment of fragmentation trees.
| S-EPMC3371839 | biostudies-literature

Flexible Accelerated STOP Tetracycline Operator-knockin (FAST): a versatile and efficient new gene modulating system.
| S-EPMC2969181 | biostudies-literature

GraphAligner: rapid and versatile sequence-to-graph alignment.
| S-EPMC7513500 | biostudies-literature

Shouji: a fast and efficient pre-alignment filter for sequence alignment.
| S-EPMC6821304 | biostudies-literature

Fast gapped-read alignment with Bowtie 2.
| S-EPMC3322381 | biostudies-literature

Fast and accurate read alignment for resequencing.
| S-EPMC3436849 | biostudies-literature

FOGSAA: Fast Optimal Global Sequence Alignment Algorithm.
| S-EPMC3638164 | biostudies-other

A versatile system for rapid multiplex genome-edited CAR T cell generation.
| S-EPMC5370017 | biostudies-literature