Unknown

Dataset Information

0

Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic.


ABSTRACT: Sufficiently powered case-control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data.We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the 'gold standard' analysis with the true underlying genotypes for both common and rare variants.An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS.lisa.strug@utoronto.caSupplementary data are available at Bioinformatics online.

SUBMITTER: Derkach A 

PROVIDER: S-EPMC4103600 | biostudies-literature | 2014 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic.

Derkach Andriy A   Chiang Theodore T   Gong Jiafen J   Addis Laura L   Dobbins Sara S   Tomlinson Ian I   Houlston Richard R   Pal Deb K DK   Strug Lisa J LJ  

Bioinformatics (Oxford, England) 20140414 15


<h4>Motivation</h4>Sufficiently powered case-control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basi  ...[more]

Similar Datasets

| S-EPMC8793683 | biostudies-literature
| S-EPMC4795927 | biostudies-literature
| S-EPMC8311305 | biostudies-literature
| S-EPMC8549748 | biostudies-literature
| S-EPMC3072823 | biostudies-literature
| S-EPMC4143755 | biostudies-literature
| S-EPMC4491794 | biostudies-other
2019-08-20 | GSE135950 | GEO
| S-EPMC9508561 | biostudies-literature
| S-EPMC3605921 | biostudies-literature