Unknown

Dataset Information

0

Comparative analysis of de novo assemblers for variation discovery in personal genomes.


ABSTRACT: Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome.

SUBMITTER: Tian S 

PROVIDER: S-EPMC6169673 | biostudies-literature | 2018 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Comparative analysis of de novo assemblers for variation discovery in personal genomes.

Tian Shulan S   Yan Huihuang H   Klee Eric W EW   Kalmbach Michael M   Slager Susan L SL  

Briefings in bioinformatics 20180901 5


Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify v  ...[more]

Similar Datasets

| S-EPMC4290589 | biostudies-literature
| S-EPMC10984382 | biostudies-literature
| S-EPMC3091720 | biostudies-literature
| S-EPMC8733867 | biostudies-literature
| S-EPMC8270462 | biostudies-literature
2021-02-01 | GSE165787 | GEO
| S-EPMC4745987 | biostudies-literature
| S-EPMC5610382 | biostudies-literature
| S-EPMC10036691 | biostudies-literature
| S-EPMC8166135 | biostudies-literature