Dataset Information

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

ABSTRACT: The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net).

SUBMITTER: Gonzalez-Dominguez J

PROVIDER: S-EPMC4711716 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

González-Domínguez Jorge J Liu Yongchao Y Schmidt Bertil B

PloS one 20160105 1

The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS ...[more]

PMID: 26731399

Dataset Information

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

Publications

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Improving PacBio long read accuracy by short read alignment.
| S-EPMC3464235 | biostudies-literature

Long Read Alignment with Parallel MapReduce Cloud Platform.
| S-EPMC4709609 | biostudies-literature

Robust and scalable barcoding for massively parallel long-read sequencing.
| S-EPMC9090787 | biostudies-literature

Scalable long read self-correction and assembly polishing with multiple sequence alignment.
| S-EPMC7804095 | biostudies-literature

LSCplus: a fast solution for improving long read accuracy by short read alignment.
| S-EPMC5103424 | biostudies-literature

Arioc: High-concurrency short-read alignment on multiple GPUs.
| S-EPMC7676696 | biostudies-literature

Fast and accurate short read alignment with Burrows-Wheeler transform.
| S-EPMC2705234 | biostudies-literature

Alfred: interactive multi-sample BAM alignment statistics, feature counting and feature annotation for long- and short-read sequencing.
| S-EPMC6612896 | biostudies-literature

ABySS: a parallel assembler for short read sequence data.
| S-EPMC2694472 | biostudies-literature

RNA-seq read alignment evaluation
2013-07-15 | E-MTAB-1728 | biostudies-arrayexpress