Unknown

Dataset Information

0

NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data.


ABSTRACT:

Motivation

Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem.

Results

We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (∼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002-HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×.

Availability and implementation

https://github.com/huangnengCSU/NanoSNP.git.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Huang N 

PROVIDER: S-EPMC9822538 | biostudies-literature | 2023 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data.

Huang Neng N   Xu Minghua M   Nie Fan F   Ni Peng P   Xiao Chuan-Le CL   Luo Feng F   Wang Jianxin J  

Bioinformatics (Oxford, England) 20230101 1


<h4>Motivation</h4>Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem.<h4>Results</h4>We developed a novel deep learning-based SNP  ...[more]

Similar Datasets

| S-EPMC5966861 | biostudies-literature
| S-EPMC3848615 | biostudies-literature
| S-EPMC3711422 | biostudies-literature
| S-EPMC8673642 | biostudies-literature
| S-EPMC6582154 | biostudies-literature
| S-EPMC8762119 | biostudies-literature
| S-EPMC4550471 | biostudies-literature
| S-EPMC5209917 | biostudies-literature
| S-EPMC5737671 | biostudies-literature
| S-EPMC8034624 | biostudies-literature