Unknown

Dataset Information

0

MultiGeMS: detection of SNVs from multiple samples using model selection on high-throughput sequencing data.


ABSTRACT: MOTIVATION:Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. RESULTS:A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. AVAILABILITY AND IMPLEMENTATION:The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems CONTACT:xinping.cui@ucr.edu SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Murillo GH 

PROVIDER: S-EPMC6280882 | biostudies-literature | 2016 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

MultiGeMS: detection of SNVs from multiple samples using model selection on high-throughput sequencing data.

Murillo Gabriel H GH   You Na N   Su Xiaoquan X   Cui Wei W   Reilly Muredach P MP   Li Mingyao M   Ning Kang K   Cui Xinping X  

Bioinformatics (Oxford, England) 20160118 10


<h4>Motivation</h4>Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic subst  ...[more]

Similar Datasets

| S-EPMC3338331 | biostudies-literature
| S-EPMC3458526 | biostudies-other
| S-EPMC5333443 | biostudies-literature
| S-EPMC7129681 | biostudies-literature
| S-EPMC7336441 | biostudies-literature
| S-EPMC8480091 | biostudies-literature
| S-EPMC5870807 | biostudies-literature
| S-EPMC4302989 | biostudies-literature
| S-EPMC8704571 | biostudies-literature
| S-EPMC2917361 | biostudies-literature