Unknown

Dataset Information

0

WarpSTR: determining tandem repeat lengths using raw nanopore signals.


ABSTRACT:

Motivation

Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required.

Results

Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique.

Availability and implementation

WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr.

SUBMITTER: Sitarcik J 

PROVIDER: S-EPMC10307940 | biostudies-literature | 2023 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

WarpSTR: determining tandem repeat lengths using raw nanopore signals.

Sitarčík Jozef J   Vinař Tomáš T   Brejová Broňa B   Krampl Werner W   Budiš Jaroslav J   Radvánszky Ján J   Lucká Mária M  

Bioinformatics (Oxford, England) 20230601 6


<h4>Motivation</h4>Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable  ...[more]

Similar Datasets

| S-EPMC9520528 | biostudies-literature
| S-EPMC10173771 | biostudies-literature
| S-EPMC7918261 | biostudies-literature
| S-EPMC11424183 | biostudies-literature
| S-EPMC8896783 | biostudies-literature
| S-EPMC10311405 | biostudies-literature
| S-EPMC8275641 | biostudies-literature
| S-EPMC5946831 | biostudies-literature
| S-EPMC8482760 | biostudies-literature
| S-EPMC10997813 | biostudies-literature