Unknown

Dataset Information

0

MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data.


ABSTRACT: Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package "MAFsnp" implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/.

SUBMITTER: Hu J 

PROVIDER: S-EPMC4550471 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data.

Hu Jiyuan J   Li Tengfei T   Xiu Zidi Z   Zhang Hong H  

PloS one 20150826 8


Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In pra  ...[more]

Similar Datasets

| S-EPMC7182099 | biostudies-literature
| S-EPMC6499249 | biostudies-literature
| S-EPMC4528635 | biostudies-literature
| S-EPMC2978646 | biostudies-literature
| S-EPMC3907006 | biostudies-literature
| S-EPMC3711422 | biostudies-literature
| S-EPMC4697941 | biostudies-literature
| S-EPMC9234764 | biostudies-literature
| S-EPMC4914105 | biostudies-literature
| S-EPMC8803190 | biostudies-literature