Unknown

Dataset Information

0

Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches.


ABSTRACT: Complementary to reference-based variant detection, recent studies revealed that many novel variants could be detected with de novo assembled genomes. To evaluate the effect of reads coverage and the accuracy of assembly-based variant calling, we simulated short reads containing more than 3 million of single nucleotide variants (SNVs) from the whole human genome and compared the efficiency of SNV calling between the assembly-based and alignment-based calling approaches. We assessed the quality of the assembled contig and found that a minimum of 30X coverage of short reads was needed to ensure reliable SNV calling and to generate assembled contigs with a good coverage of genome and genes. In addition, we observed that the assembly-based approach had a much lower recall rate and precision comparing to the alignment-based approach that would recover 99% of imputed SNVs. We observed similar results with experimental reads for NA24385, an individual whose germline variants were well characterized. Although there are additional values for SNVs detection, the assembly-based approach would have great risk of false discovery of novel SNVs. Further improvement of de novo assembly algorithms are needed in order to warrant a good completeness of genome with haplotype resolved and high fidelity of assembled sequences.

SUBMITTER: Wu L 

PROVIDER: S-EPMC5591230 | biostudies-literature | 2017 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches.

Wu Leihong L   Yavas Gokhan G   Hong Huixiao H   Tong Weida W   Xiao Wenming W  

Scientific reports 20170908 1


Complementary to reference-based variant detection, recent studies revealed that many novel variants could be detected with de novo assembled genomes. To evaluate the effect of reads coverage and the accuracy of assembly-based variant calling, we simulated short reads containing more than 3 million of single nucleotide variants (SNVs) from the whole human genome and compared the efficiency of SNV calling between the assembly-based and alignment-based calling approaches. We assessed the quality o  ...[more]

Similar Datasets

| S-EPMC4504710 | biostudies-literature
| S-EPMC4757955 | biostudies-literature
2018-02-06 | GSE110114 | GEO
| S-EPMC10045170 | biostudies-literature
| S-EPMC7044309 | biostudies-literature
| S-EPMC7809098 | biostudies-literature
| S-EPMC9710574 | biostudies-literature
| EGAS00001007819 | EGA
| S-EPMC4753679 | biostudies-literature
| S-EPMC3792961 | biostudies-literature