Dataset Information

Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS.

ABSTRACT: We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC) Study. Our first stage, targeted at controlling false negatives, yields better power than using Bonferroni-corrected significance level. The LASSO regression reduces the number of significant SNPs in stage two: it reduces false-positive SNPs and it reduces true-positive SNPs also at simulated causal loci due to linkage disequilibrium. Interestingly, the LASSO regression preserves the power from stage one, i.e., the number of causal loci detected from the LASSO regression in stage two is almost the same as in stage one, while reducing false positives further. Real data on systolic blood pressure in the ARIC study was analyzed using our two-stage approach which identified two significant SNPs, one of which was reported to be genome-significant in a meta-analysis containing a much larger sample size. On the other hand, a single SNP association scan did not yield any significant results.

SUBMITTER: Shi G

PROVIDER: S-EPMC3624896 | biostudies-literature | 2011 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS.

Shi Gang G Boerwinkle Eric E Morrison Alanna C AC Gu C Charles CC Chakravarti Aravinda A Rao D C DC

Genetic epidemiology 20101231 2

We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian ...[more]

PMID: 21254218

Similar Datasets

Project description:BackgroundMechanized dry seeded rice can save both labour and water resources. Rice seedling establishment is sensitive to sowing depth while mesocotyl elongation facilitates the emergence of deeply sown seeds.ResultsA set of 270 rice accessions, including 170 from the mini-core collection of Chinese rice germplasm (C Collection) and 100 varieties used in a breeding program for drought resistance (D Collection), was screened for mesocotyl lengths of seedlings grown in water (MLw) in darkness and in 5 cm sand culture (MLs). Twenty six accessions (10.53 %) have MLw longer than 1.0 cm. Eleven accessions had the highest mesocotyl lengths, i.e. 1.4 - 5.05 cm of MLw and 3.0 - 6.4 cm in 10 cm sand culture, including 7 upland landraces or varieties. The genotypic data of 1,019,883 SNPs were developed by re-sequencing of those accessions. A whole-genome SNP array (Rice SNP50) was used to genotype 24 accessions as a validation panel, giving 98.41 % of consistent SNPs with the re-sequencing data in average. GWAS based on compressed mixed linear model was conducted using GAPIT. Based on a threshold of -log(P) ≥8.0, 13 loci were associated to MLw on rice chromosome 1, 3, 4, 5, 6 and 9, respectively. Three associated loci, on chromosome 3, 6, and 10, were detected for MLs. A set of 99 associated SNPs for MLw, based on a compromised threshold (-log(P) ≥7.0), located in intergenic regions or different positions of 36 annotated genes, including one cullin and one growth regulating factor gene.ConclusionsHigher proportion and extension of elongated mesocotyls were observed in the mini-core collection of rice germplasm and upland rice landraces or varieties, possibly causing the correlation between mesocotyl elongation and drought resistance. GWAS found 13 loci for mesocotyl length measured in dark germination that confirmed the previously reported co-location of two QTLs across populations and experiments. Associated SNPs hit 36 annotated genes including function-matching candidates like cullin and GRF. The germplasm with elongated mesocotyl, especially upland landraces or varieties, and the associated SNPs could be useful in further studies and breeding of mechanized dry seeded rice.

Dataset Information

Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS.

Publications

Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets