Unknown

Dataset Information

0

NanoSplicer: accurate identification of splice junctions using Oxford Nanopore sequencing.


ABSTRACT:

Motivation

Long-read sequencing methods have considerable advantages for characterizing RNA isoforms. Oxford Nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilizing matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages.

Results

We developed 'NanoSplicer' to identify splice junctions using raw nanopore signal (squiggles). For each splice junction, the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using (i) synthetic mRNAs with known splice junctions and (ii) biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated.

Availability and implementation

NanoSplicer is available at https://github.com/shimlab/NanoSplicer and archived at https://doi.org/10.5281/zenodo.6403849. Data is available from ENA: ERS7273757 and ERS7273453.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: You Y 

PROVIDER: S-EPMC9344838 | biostudies-literature | 2022 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

NanoSplicer: accurate identification of splice junctions using Oxford Nanopore sequencing.

You Yupei Y   Clark Michael B MB   Shim Heejung H  

Bioinformatics (Oxford, England) 20220801 15


<h4>Motivation</h4>Long-read sequencing methods have considerable advantages for characterizing RNA isoforms. Oxford Nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilizing matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or  ...[more]

Similar Datasets

| S-EPMC9773502 | biostudies-literature
| S-EPMC6039791 | biostudies-literature
| S-EPMC9911510 | biostudies-literature
| S-EPMC5093776 | biostudies-literature
| S-EPMC4970289 | biostudies-literature
| S-EPMC7141017 | biostudies-literature
| S-EPMC5513335 | biostudies-literature
| S-EPMC9045196 | biostudies-literature
| S-EPMC10303500 | biostudies-literature
| S-EPMC7841463 | biostudies-literature