Unknown

Dataset Information

0

Fast and accurate correction of optical mapping data via spaced seeds.


ABSTRACT:

Motivation

Optical mapping data is used in many core genomics applications, including structural variation detection, scaffolding assembled contigs and mis-assembly detection. However, the pervasiveness of spurious and deleted cut sites in the raw data, which are called Rmaps, make assembly and alignment of them challenging. Although there exists another method to error correct Rmap data, named cOMet, it is unable to scale to even moderately large sized genomes. The challenge faced in error correction is in determining pairs of Rmaps that originate from the same region of the same genome.

Results

We create an efficient method for determining pairs of Rmaps that contain significant overlaps between them. Our method relies on the novel and nontrivial adaption and application of spaced seeds in the context of optical mapping, which allows for spurious and deleted cut sites to be accounted for. We apply our method to detecting and correcting these errors. The resulting error correction method, referred to as Elmeri, improves upon the results of state-of-the-art correction methods but in a fraction of the time. More specifically, cOMet required 9.9 CPU days to error correct Rmap data generated from the human genome, whereas Elmeri required less than 15 CPU hours and improved the quality of the Rmaps by more than four times compared to cOMet.

Availability and implementation

Elmeri is publicly available under GNU Affero General Public License at https://github.com/LeenaSalmela/Elmeri.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Salmela L 

PROVIDER: S-EPMC7005598 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Fast and accurate correction of optical mapping data via spaced seeds.

Salmela Leena L   Mukherjee Kingshuk K   Puglisi Simon J SJ   Muggli Martin D MD   Boucher Christina C  

Bioinformatics (Oxford, England) 20200201 3


<h4>Motivation</h4>Optical mapping data is used in many core genomics applications, including structural variation detection, scaffolding assembled contigs and mis-assembly detection. However, the pervasiveness of spurious and deleted cut sites in the raw data, which are called Rmaps, make assembly and alignment of them challenging. Although there exists another method to error correct Rmap data, named cOMet, it is unable to scale to even moderately large sized genomes. The challenge faced in er  ...[more]

Similar Datasets

| S-EPMC3627565 | biostudies-literature
| S-EPMC5409309 | biostudies-literature
| S-EPMC2752623 | biostudies-literature
| S-EPMC3392737 | biostudies-literature
| S-EPMC4457958 | biostudies-literature
| S-EPMC6941997 | biostudies-literature
| S-EPMC3009509 | biostudies-literature
| S-EPMC6280799 | biostudies-other
| S-EPMC5181568 | biostudies-literature
| S-EPMC5747520 | biostudies-literature