Unknown

Dataset Information

0

De novo diploid genome assembly using long noisy reads.


ABSTRACT: The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.

SUBMITTER: Nie F 

PROVIDER: S-EPMC10997618 | biostudies-literature | 2024 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

De novo diploid genome assembly using long noisy reads.

Nie Fan F   Ni Peng P   Huang Neng N   Zhang Jun J   Wang Zhenyu Z   Xiao Chuanle C   Luo Feng F   Wang Jianxin J  

Nature communications 20240405 1


The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller an  ...[more]

Similar Datasets

| S-EPMC8549298 | biostudies-literature
| S-EPMC9632051 | biostudies-literature
| S-EPMC5770995 | biostudies-literature
| S-EPMC5765664 | biostudies-literature
| S-EPMC10091225 | biostudies-literature
| S-EPMC5411768 | biostudies-literature
| S-EPMC8771625 | biostudies-literature
| S-EPMC6925183 | biostudies-literature
| S-EPMC6487145 | biostudies-literature
| S-EPMC8085491 | biostudies-literature