Unknown

Dataset Information

0

Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms.


ABSTRACT: Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These 'squiggles' are a noisy, distorted representation of the underlying true stepped current levels due to experimental and algorithmic factors. We were interested in developing a simulation model to support a white-box approach to identify common distortions, rather than relying on commonly used black box neural network techniques for basecalling nanopore signals. Dynamic time warped-space averaging (DTWA) techniques can generate a consensus from multiple noisy signals without introducing key feature distortions that occur with standard averaging. As a preprocessing tool, DTWA could provide cleaner and more accurate current signals for direct RNA or DNA analysis tools. However, DTWA approaches need modification to take advantage of the a-priori knowledge regarding a common, underlying gold-standard RNA / DNA sequence. Using experimental data, we derive a simulation model to provide known squiggle distortion signals to assist in validating the performance of analysis tools such as DTWA. Simulation models were evaluated by comparing mocked and experimental squiggle characteristics from one Enolase mRNA squiggle group produced by an Oxford MinION nanopore sequencer, and cross-validated using other Enolase, Sequin R1_71_1 and Sequin R2_55_3 mRNA studies. New techniques identified high inserted but low deleted base rates, generating consistent x1.7 squiggle event to base called ratios. Similar probability density and cumulative distribution functions, PDF and CDF, were found across all studies. Experimental PDFs were not the normal distributions expected if squiggle distortion arose from segmentation algorithm artefacts, or through individual nucleotides randomly interacting with individual nanopores. Matching experimental and mocked CDFs required the assumption that there are unique features associated with individual raw-current data streams. Z-normalized signal-to-noise ratios suggest intrinsic sensor limitations being responsible for half the gold standard and noisy squiggle DTW differences.

SUBMITTER: Smith M 

PROVIDER: S-EPMC6638935 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms.

Smith Michael M   Chan Rachel R   Gordon Paul P  

PloS one 20190718 7


Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These 'squiggles' are a noisy, distorted representation of the underlying true stepped current levels due to experimental and algorithmic factors. We were interested in developing a simulation model to support a white-box approach to identify common distortions, rather than relying on commonly use  ...[more]

Similar Datasets

| S-EPMC3190662 | biostudies-other
| S-EPMC8796371 | biostudies-literature
| S-EPMC9354552 | biostudies-literature
| S-EPMC6350616 | biostudies-literature
| S-EPMC7596462 | biostudies-literature
| S-EPMC5226842 | biostudies-literature
2009-11-24 | GSE15370 | GEO
| S-EPMC6506227 | biostudies-literature
| S-EPMC7178565 | biostudies-literature
2010-05-19 | E-GEOD-15370 | biostudies-arrayexpress