Unknown

Dataset Information

0

IMOS: improved Meta-aligner and Minimap2 On Spark.


ABSTRACT: BACKGROUND:Long reads provide valuable information regarding the sequence composition of genomes. Long reads are usually very noisy which renders their alignments on the reference genome a daunting task. It may take days to process datasets enough to sequence a human genome on a single node. Hence, it is of primary importance to have an aligner which can operate on distributed clusters of computers with high performance in accuracy and speed. RESULTS:In this paper, we presented IMOS, an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM) enhancing both its accuracy and speed. IM is up to 6x faster than the original Meta-aligner. It is also implemented to run IM and Minimap2 on Apache Spark for deploying on a cluster of nodes. Moreover, multi-node IMOS is faster than SparkBWA while executing both IM (1.5x) and Minimap2 (25x). CONCLUSION:In this paper, we purposed an architecture for mapping long reads to a reference. Due to its implementation, IMOS speed can increase almost linearly with respect to the number of nodes in a cluster. Also, it is a multi-platform application able to operate on Linux, Windows, and macOS.

SUBMITTER: Hadadian Nejad Yousefi M 

PROVIDER: S-EPMC6345043 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

IMOS: improved Meta-aligner and Minimap2 On Spark.

Hadadian Nejad Yousefi Mostafa M   Goudarzi Maziar M   Motahari Seyed Abolfazl SA  

BMC bioinformatics 20190124 1


<h4>Background</h4>Long reads provide valuable information regarding the sequence composition of genomes. Long reads are usually very noisy which renders their alignments on the reference genome a daunting task. It may take days to process datasets enough to sequence a human genome on a single node. Hence, it is of primary importance to have an aligner which can operate on distributed clusters of computers with high performance in accuracy and speed.<h4>Results</h4>In this paper, we presented IM  ...[more]

Similar Datasets

| S-EPMC6288881 | biostudies-literature
| S-EPMC5324271 | biostudies-literature
| S-EPMC3530905 | biostudies-literature
| S-EPMC4449525 | biostudies-literature
2012-11-06 | GSE39096 | GEO
| S-EPMC6113509 | biostudies-literature
| S-EPMC6131214 | biostudies-literature
| S-EPMC4907389 | biostudies-literature