Unknown

Dataset Information

0

OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.


ABSTRACT: BACKGROUND:Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. RESULTS:We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. CONCLUSIONS:We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.

SUBMITTER: Verzotto D 

PROVIDER: S-EPMC4719737 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.

Verzotto Davide D   M Teo Audrey S AS   Hillmer Axel M AM   Nagarajan Niranjan N  

GigaScience 20160119


<h4>Background</h4>Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genome  ...[more]

Similar Datasets

| S-EPMC7531523 | biostudies-literature
| S-EPMC8254269 | biostudies-literature
| S-EPMC6966875 | biostudies-literature
| S-EPMC1955456 | biostudies-literature
| S-EPMC4414605 | biostudies-literature
| S-EPMC8678206 | biostudies-literature
| S-EPMC9839601 | biostudies-literature
| S-EPMC6474799 | biostudies-literature
| S-EPMC10881975 | biostudies-literature
| S-EPMC5908213 | biostudies-literature