Unknown

Dataset Information

0

SmsMap: mapping single molecule sequencing reads by locating the alignment starting positions.


ABSTRACT:

Background

Single Molecule Sequencing (SMS) technology can produce longer reads with higher sequencing error rate. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. Most existing mapping tools generally adopt the traditional seed-and-extend strategy, and the candidate aligned regions for each query read are selected either by counting the number of matched seeds or chaining a group of seeds. However, for all the existing mapping tools, the coverage ratio of the alignment region to the query read is lower, and the read alignment quality and efficiency need to be improved. Here, we introduce smsMap, a novel mapping tool that is specifically designed to map the long reads of SMS to a reference genome.

Results

smsMap was evaluated with other existing seven SMS mapping tools (e.g., BLASR, minimap2, and BWA-MEM) on both simulated and real-life SMS datasets. The experimental results show that smsMap can efficiently achieve higher aligned read coverage ratio and has higher sensitivity that can align more sequences and bases to the reference genome. Additionally, smsMap is more robust to sequencing errors.

Conclusions

smsMap is computationally efficient to align SMS reads, especially for the larger size of the reference genome (e.g., H. sapiens genome with over 3 billion base pairs). The source code of smsMap can be freely downloaded from https://github.com/NWPU-903PR/smsMap .

SUBMITTER: Wei ZG 

PROVIDER: S-EPMC7430848 | biostudies-literature | 2020 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.

Wei Ze-Gang ZG   Zhang Shao-Wu SW   Liu Fei F  

BMC bioinformatics 20200804 1


<h4>Background</h4>Single Molecule Sequencing (SMS) technology can produce longer reads with higher sequencing error rate. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. Most existing mapping tools generally adopt the traditional seed-and-extend strategy, and the candidate aligned regions for each query read are selected either by counting the number of matched seeds or chaining a group of seeds. However, for all the  ...[more]

Similar Datasets

| S-EPMC3572422 | biostudies-literature
| S-EPMC4970289 | biostudies-literature
| S-EPMC3707490 | biostudies-literature
| S-EPMC5291262 | biostudies-literature
| S-EPMC7879691 | biostudies-literature
| S-EPMC3118166 | biostudies-literature
| S-EPMC4652746 | biostudies-literature
| S-EPMC4265526 | biostudies-literature
| S-EPMC3581251 | biostudies-literature
| S-EPMC9197060 | biostudies-literature