Dataset Information

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

ABSTRACT:

Background

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators.

Results

This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency.

Conclusions

Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

SUBMITTER: Lan H

PROVIDER: S-EPMC4959381 | biostudies-literature | 2016 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Lan Haidong H Chan Yuandong Y Xu Kai K Schmidt Bertil B Peng Shaoliang S Liu Weiguo W

BMC bioinformatics 20160719

<h4>Background</h4>Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators.<h4>Results</h4>This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment ba ...[more]

PMID: 27455061

Dataset Information

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Background

Results

Conclusions

Publications

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Large-scale parallel alignment of platelet-shaped particles through gravitational sedimentation.
| S-EPMC4435022 | biostudies-other

Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues.
| S-EPMC2632924 | biostudies-literature

Parallel clustering algorithm for large-scale biological data sets.
| S-EPMC3976248 | biostudies-literature

Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.
| S-EPMC4410667 | biostudies-literature

Large scale sequence alignment via efficient inference in generative models.
| S-EPMC10160065 | biostudies-literature

Large scale evaluation of differences between network-based and pairwise sequence-alignment-based methods of dendrogram reconstruction.
| S-EPMC6728023 | biostudies-literature

Stochastic block coordinate Frank-Wolfe algorithm for large-scale biological network alignment.
| S-EPMC4826425 | biostudies-literature

Bit-parallel sequence-to-graph alignment.
| S-EPMC6761980 | biostudies-literature

Instability in progressive multiple sequence alignment algorithms.
| S-EPMC4599319 | biostudies-literature

Comparative investigation of parallel spatial interpolation algorithms for building large-scale digital elevation models.
| S-EPMC7924418 | biostudies-literature