Project description:We report a genome-wide assessment of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) in schizophrenia. We investigated SNPs using 871 patients and 863 controls, following up the top hits in four independent cohorts comprising 1,460 patients and 12,995 controls, all of European origin. We found no genome-wide significant associations, nor could we provide support for any previously reported candidate gene or genome-wide associations. We went on to examine CNVs using a subset of 1,013 cases and 1,084 controls of European ancestry, and a further set of 60 cases and 64 controls of African ancestry. We found that eight cases and zero controls carried deletions greater than 2 Mb, of which two, at 8p22 and 16p13.11-p12.4, are newly reported here. A further evaluation of 1,378 controls identified no deletions greater than 2 Mb, suggesting a high prior probability of disease involvement when such deletions are observed in cases. We also provide further evidence for some smaller, previously reported, schizophrenia-associated CNVs, such as those in NRXN1 and APBA2. We could not provide strong support for the hypothesis that schizophrenia patients have a significantly greater "load" of large (>100 kb), rare CNVs, nor could we find common CNVs that associate with schizophrenia. Finally, we did not provide support for the suggestion that schizophrenia-associated CNVs may preferentially disrupt genes in neurodevelopmental pathways. Collectively, these analyses provide the first integrated study of SNPs and CNVs in schizophrenia and support the emerging view that rare deleterious variants may be more important in schizophrenia predisposition than common polymorphisms. While our analyses do not suggest that implicated CNVs impinge on particular key pathways, we do support the contribution of specific genomic regions in schizophrenia, presumably due to recurrent mutation. On balance, these data suggest that very few schizophrenia patients share identical genomic causation, potentially complicating efforts to personalize treatment regimens.
Project description:BackgroundRecently mixed linear models are used to address the issue of "missing" heritability in traditional Genome-wide association studies (GWAS). The models assume that all single-nucleotide polymorphisms (SNPs) are associated with the phenotypes of interest. However, it is more common that only a small proportion of SNPs have significant effects on the phenotypes, while most SNPs have no or very small effects. To incorporate this feature, we propose an efficient Hierarchical Bayesian Model (HBM) that extends the existing mixed models to enforce automatic selection of significant SNPs. The HBM models the SNP effects using a mixture distribution of a point mass at zero and a normal distribution, where the point mass corresponds to those non-associative SNPs.ResultsWe estimate the HBM using Gibbs sampling. The estimation performance of our method is first demonstrated through two simulation studies. We make the simulation setups realistic by using parameters fitted on the Framingham Heart Study (FHS) data. The simulation studies show that our method can accurately estimate the proportion of SNPs associated with the simulated phenotype and identify these SNPs, as well as adapt to certain model mis-specification than the standard mixed models. In addition, we analyze data from the FHS and the Health and Retirement Study (HRS) to study the association between Body Mass Index (BMI) and SNPs on Chromosome 16, and replicate the identified genetic associations. The analysis of the FHS data identifies 0.3% SNPs on Chromosome 16 that affect BMI, including rs9939609 and rs9939973 on the FTO gene. These two SNPs are in strong linkage disequilibrium with rs1558902 (Rsq =0.901 for rs9939609 and Rsq =0.905 for rs9939973), which has been reported to be linked with obesity in previous GWAS. We then replicate the findings using the HRS data: the analysis finds 0.4% of SNPs associated with BMI on Chromosome 16. Furthermore, around 25% of the genes that are identified to be associated with BMI are common between the two studies.ConclusionsThe results demonstrate that the HBM and the associated estimation algorithm offer a powerful tool for identifying significant genetic associations with phenotypes of interest, among a large number of SNPs that are common in modern genetics studies.
Project description:MotivationThe quality control (QC) filtering of single nucleotide polymorphisms (SNPs) is an important step in genome-wide association studies to minimize potential false findings. SNP QC commonly uses expert-guided filters based on QC variables [e.g. Hardy-Weinberg equilibrium, missing proportion (MSP) and minor allele frequency (MAF)] to remove SNPs with insufficient genotyping quality. The rationale of the expert filters is sensible and concrete, but its implementation requires arbitrary thresholds and does not jointly consider all QC features.ResultsWe propose an algorithm that is based on principal component analysis and clustering analysis to identify low-quality SNPs. The method minimizes the use of arbitrary cutoff values, allows a collective consideration of the QC features and provides conditional thresholds contingent on other QC variables (e.g. different MSP thresholds for different MAFs). We apply our method to the seven studies from the Wellcome Trust Case Control Consortium and the major depressive disorder study from the Genetic Association Information Network. We measured the performance of our method compared to the expert filters based on the following criteria: (i) percentage of SNPs excluded due to low quality; (ii) inflation factor of the test statistics (lambda); (iii) number of false associations found in the filtered dataset; and (iv) number of true associations missed in the filtered dataset. The results suggest that with the same or fewer SNPs excluded, the proposed algorithm tends to give a similar or lower value of lambda, a reduced number of false associations, and retains all true associations.AvailabilityThe algorithm is available at http://www4.stat.ncsu.edu/jytzeng/software.php
Project description:Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.
Project description:Psychiatric disorders such as schizophrenia are commonly accompanied by cognitive impairments that are treatment resistant and crucial to functional outcome. There has been great interest in studying cognitive measures as endophenotypes for psychiatric disorders, with the hope that their genetic basis will be clearer. To investigate this, we performed a genome-wide association study involving 11 cognitive phenotypes from the Cambridge Neuropsychological Test Automated Battery. We showed these measures to be heritable by comparing the correlation in 100 monozygotic and 100 dizygotic twin pairs. The full battery was tested in approximately 750 subjects, and for spatial and verbal recognition memory, we investigated a further 500 individuals to search for smaller genetic effects. We were unable to find any genome-wide significant associations with either SNPs or common copy number variants. Nor could we formally replicate any polymorphism that has been previously associated with cognition, although we found a weak signal of lower than expected P-values for variants in a set of 10 candidate genes. We additionally investigated SNPs in genomic loci that have been shown to harbor rare variants that associate with neuropsychiatric disorders, to see if they showed any suggestion of association when considered as a separate set. Only NRXN1 showed evidence of significant association with cognition. These results suggest that common genetic variation does not strongly influence cognition in healthy subjects and that cognitive measures do not represent a more tractable genetic trait than clinical endpoints such as schizophrenia. We discuss a possible role for rare variation in cognitive genomics.
Project description:This study investigated the possible association between single nucleotide polymorphism (SNP) sites on a genome wide level and the presence of polycystic ovary syndrome (PCOS) in a local population. Patients treated for PCOS in the outpatient clinic of the reproductive medicine center of Changzhou Maternal and Child Health Care Hospital (affiliated to Nanjing Medical University) from January of 2010 to December 2012 were selected. Female patients affected by infertility due to simple oviduct reasons or male factors, during the same period, were enrolled for the control group. A genome-wide association study was performed. Specific experimental steps included extraction of the total human DNA and optimization of PCR amplification of target genes; flight mass spectrometry for genotyping; and statistical analyses of sequencing results. By primary selection and secondary verification at two stages in the experiment, three SNP sites were found to contain significantly different allele frequencies between the patient and control groups (P<0.05): rs346795081 on THADA, rs346803513 on DENND1A and rs346999236 on TOX3. The average expression levels at the three discovered SNPs sites were significantly different between the patient and the control groups, indicating their correlation with PCOS, and the possible role of their corresponding genes on the pathogenesis of the disease.
Project description:PurposeGenome-wide-association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNPs) that are associated with an increased risk of breast cancer. Most of these studies were conducted primarily in postmenopausal breast cancer patients. Therefore, we set out to assess whether or not these breast cancer variants are also associated with an elevated risk of breast cancer in young premenopausal patients.MethodsIn 451 women of European ancestry who had prospectively enrolled in a longitudinal cohort study for women diagnosed with breast cancer at or under age 40, we genotyped 44 SNPs that were previously associated with breast cancer risk. A control group was comprised of 1142 postmenopausal healthy women from the Nurses' Health Study (NHS). We assessed if the frequencies of the adequately genotyped SNPs differed significantly (p≤0.05) between the cohort of young breast cancer patients and postmenopausal controls, and then we corrected for multiple testing.ResultsGenotyping of the controls or cases was inadequate for comparisons between the groups for seven of the 44 SNPs. 9 of the remaining 37 were associated with breast cancer risk in young women with a p-value <0.05: rs10510102, rs1219648, rs13387042, rs1876206, rs2936870, rs2981579, rs3734805, rs3803662 and rs4973768. The directions of these associations were consistent with those in postmenopausal women. However, after correction for multiple testing (Benjamini Hochberg) none of the results remained statistically significant.ConclusionAfter correction for multiple testing, none of the alleles for postmenopausal breast cancer were clearly associated with risk of premenopausal breast cancer in this relatively small study.
Project description:ObjectivesGenome-wide association studies (GWASs) have revealed many SNPs and genes associated with osteoporosis. However, influence of these SNPs and genes on the predisposition to osteoporosis is not fully understood. We aimed to identify osteoporosis GWASs-associated SNPs potentially influencing the binding affinity of transcription factors and miRNAs, and reveal enrichment signaling pathway and "hub" genes of osteoporosis GWAS-associated genes.MethodsWe conducted multiple computational analyses to explore function and mechanisms of osteoporosis GWAS-associated SNPs and genes, including SNP conservation analysis and functional annotation (influence of SNPs on transcription factors and miRNA binding), gene ontology analysis, pathway analysis and protein-protein interaction analysis.ResultsOur results suggested that a number of SNPs potentially influence the binding affinity of transcription factors (NFATC2, MEF2C, SOX9, RUNX2, ESR2, FOXA1 and STAT3) and miRNAs. Osteoporosis GWASs-associated genes showed enrichment of Wnt signaling pathway, basal cell carcinoma and Hedgehog signaling pathway. Highly interconnected "hub" genes revealed by interaction network analysis are RUNX2, SP7, TNFRSF11B, LRP5, DKK1, ESR1 and SOST.ConclusionsOur results provided the targets for further experimental assessment and further insight on osteoporosis pathophysiology.
Project description:Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs). The traditional SNP-wise approach along with multiple testing adjustment is over-conservative and lack of power in many GWASs. In this article, we proposed a model-based clustering method that transforms the challenging high-dimension-small-sample-size problem to low-dimension-large-sample-size problem and borrows information across SNPs by grouping SNPs into three clusters. We pre-specify the patterns of clusters by minor allele frequencies of SNPs between cases and controls, and enforce the patterns with prior distributions. In the simulation studies our proposed novel model outperforms traditional SNP-wise approach by showing better controls of false discovery rate (FDR) and higher sensitivity. We re-analyzed two real studies to identifying SNPs associated with severe bortezomib-induced peripheral neuropathy (BiPN) in patients with multiple myeloma (MM). The original analysis in the literature failed to identify SNPs after FDR adjustment. Our proposed method not only detected the reported SNPs after FDR adjustment but also discovered a novel BiPN-associated SNP rs4351714 that has been reported to be related to MM in another study.
Project description:Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1-FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci.