Unknown

Dataset Information

0

Long read alignment based on maximal exact match seeds.


ABSTRACT:

Motivation

The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger.

Results

We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads.

Availability

CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net

Contact

liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Liu Y 

PROVIDER: S-EPMC3436841 | biostudies-literature | 2012 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Long read alignment based on maximal exact match seeds.

Liu Yongchao Y   Schmidt Bertil B  

Bioinformatics (Oxford, England) 20120901 18


<h4>Motivation</h4>The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger.<h4>Results</h4>We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner  ...[more]

Similar Datasets

| S-EPMC6528274 | biostudies-literature
| S-EPMC10932610 | biostudies-literature
| S-EPMC5324271 | biostudies-literature
| S-EPMC3464235 | biostudies-literature
2011-11-01 | GSE30978 | GEO
| S-EPMC2852239 | biostudies-literature
| S-EPMC4709609 | biostudies-literature
| S-EPMC6416333 | biostudies-literature
2011-11-01 | E-GEOD-30978 | biostudies-arrayexpress
| S-EPMC5103424 | biostudies-literature