Dataset Information

Long read alignment based on maximal exact match seeds.

ABSTRACT:

Motivation

The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger.

Results

We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads.

Availability

CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net

Contact

liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Liu Y

PROVIDER: S-EPMC3436841 | biostudies-literature | 2012 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Long read alignment based on maximal exact match seeds.

Liu Yongchao Y Schmidt Bertil B

Bioinformatics (Oxford, England) 20120901 18

<h4>Motivation</h4>The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger.<h4>Results</h4>We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner ...[more]

PMID: 22962447

Dataset Information

Long read alignment based on maximal exact match seeds.

Motivation

Results

Availability

Contact

Supplementary information

Publications

Long read alignment based on maximal exact match seeds.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Pairwise alignment of nucleotide sequences using maximal exact matches.
| S-EPMC6528274 | biostudies-literature

Exact global alignment using A* with chaining seed heuristic and match pruning.
| S-EPMC10932610 | biostudies-literature

Meta-aligner: long-read alignment based on genome statistics.
| S-EPMC5324271 | biostudies-literature

Improving PacBio long read accuracy by short read alignment.
| S-EPMC3464235 | biostudies-literature

Gene expression levels with various artificial mutations -- exact match
2011-11-01 | GSE30978 | GEO

Confidence intervals that match Fisher's exact or Blaker's exact tests.
| S-EPMC2852239 | biostudies-literature

Long Read Alignment with Parallel MapReduce Cloud Platform.
| S-EPMC4709609 | biostudies-literature

Featherweight long read alignment using partitioned reference indexes.
| S-EPMC6416333 | biostudies-literature

Gene expression levels with various artificial mutations -- exact match
2011-11-01 | E-GEOD-30978 | biostudies-arrayexpress

LSCplus: a fast solution for improving long read accuracy by short read alignment.
| S-EPMC5103424 | biostudies-literature