Dataset Information

Single nucleotide polymorphism (SNP)-strings: an alternative method for assessing genetic associations.

ABSTRACT:

Background

Genome-wide association studies (GWAS) identify disease-associations for single-nucleotide-polymorphisms (SNPs) from scattered genomic-locations. However, SNPs frequently reside on several different SNP-haplotypes, only some of which may be disease-associated. This circumstance lowers the observed odds-ratio for disease-association.

Methodology/principal findings

Here we develop a method to identify the two SNP-haplotypes, which combine to produce each person's SNP-genotype over specified chromosomal segments. Two multiple sclerosis (MS)-associated genetic regions were modeled; DRB1 (a Class II molecule of the major histocompatibility complex) and MMEL1 (an endopeptidase that degrades both neuropeptides and β-amyloid). For each locus, we considered sets of eleven adjacent SNPs, surrounding the putative disease-associated gene and spanning ∼200 kb of DNA. The SNP-information was converted into an ordered-set of eleven-numbers (subject-vectors) based on whether a person had zero, one, or two copies of particular SNP-variant at each sequential SNP-location. SNP-strings were defined as those ordered-combinations of eleven-numbers (0 or 1), representing a haplotype, two of which combined to form the observed subject-vector. Subject-vectors were resolved using probabilistic methods. In both regions, only a small number of SNP-strings were present. We compared our method to the SHAPEIT-2 phasing-algorithm. When the SNP-information spanning 200 kb was used, SHAPEIT-2 was inaccurate. When the SHAPEIT-2 window was increased to 2,000 kb, the concordance between the two methods, in both of these eleven-SNP regions, was over 99%, suggesting that, in these regions, both methods were quite accurate. Nevertheless, correspondence was not uniformly high over the entire DNA-span but, rather, was characterized by alternating peaks and valleys of concordance. Moreover, in the valleys of poor-correspondence, SHAPEIT-2 was also inconsistent with itself, suggesting that the SNP-string method is more accurate across the entire region.

Conclusions/significance

Accurate haplotype identification will enhance the detection of genetic-associations. The SNP-string method provides a simple means to accomplish this and can be extended to cover larger genomic regions, thereby improving a GWAS's power, even for those published previously.

SUBMITTER: Goodin DS

PROVIDER: S-EPMC3984082 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Single nucleotide polymorphism (SNP)-strings: an alternative method for assessing genetic associations.

Goodin Douglas S DS Khankhanian Pouya P

PloS one 20140411 4

<h4>Background</h4>Genome-wide association studies (GWAS) identify disease-associations for single-nucleotide-polymorphisms (SNPs) from scattered genomic-locations. However, SNPs frequently reside on several different SNP-haplotypes, only some of which may be disease-associated. This circumstance lowers the observed odds-ratio for disease-association.<h4>Methodology/principal findings</h4>Here we develop a method to identify the two SNP-haplotypes, which combine to produce each person's SNP-geno ...[more]

PMID: 24727690

Similar Datasets

Project description:Litchi is an important fruit tree in tropical and subtropical areas of the world. However, there is widespread confusion regarding litchi cultivar nomenclature and detailed information of genetic relationships among litchi germplasm is unclear. In the present study, the potential of single nucleotide polymorphism (SNP) for the identification of 96 representative litchi accessions and their genetic relationships in China was evaluated using 155 SNPs that were evenly spaced across litchi genome. Ninety SNPs with minor allele frequencies above 0.05 and a good genotyping success rate were used for further analysis. A relatively high level of genetic variation was observed among litchi accessions, as quantified by the expected heterozygosity (He = 0.305). The SNP based multilocus matching identified two synonymous groups, 'Heiye' and 'Wuye', and 'Chengtuo' and 'Baitangli 1'. A subset of 14 SNPs was sufficient to distinguish all the non-redundant litchi genotypes, and these SNPs were proven to be highly stable by repeated analyses of a selected group of cultivars. Unweighted pair-group method of arithmetic averages (UPGMA) cluster analysis divided the litchi accessions analyzed into four main groups, which corresponded to the traits of extremely early-maturing, early-maturing, middle-maturing, and late-maturing, indicating that the fruit maturation period should be considered as the primary criterion for litchi taxonomy. Two subpopulations were detected among litchi accessions by STRUCTURE analysis, and accessions with extremely early- and late-maturing traits showed membership coefficients above 0.99 for Cluster 1 and Cluster 2, respectively. Accessions with early- and middle-maturing traits were identified as admixture forms with varying levels of membership shared between the two clusters, indicating their hybrid origin during litchi domestication. The results of this study will benefit litchi germplasm conservation programs and facilitate maximum genetic gains in litchi breeding programs.

Project description:Dear Editor, The recent article by Mohammadzadeh et al.[1] on the latest issue of this Journal showed that the T allele +276G/T SNP of ADIPOQ gene is more associated with the increasing risk of coronary artery disease (CAD) in subjects with type 2 diabetes. Adipocytes were described in myocardial tissue of CAD patients and their role recently discussed[2,3]. Susceptibility to CAD by polymorphism in the Q gene of adiponectin has been reported for 3'-UTR, which harbours some genetic loci associated with metabolic risks and atherosclerosis[4]. Actually, previous studies have shown that the haplotype SNP +276G>T was associated with a decreased risk of CAD, after adjustment for potential confounding factors, therefore some controversial opinion still exists[5]. This evidence should be associated with the role exerted by adipocytes and adiponectin in heart physiology. In particular, in hypertensive disorder complicating pregnancy (HDCP), by investigating the population frequency of alleles, genotypes, and haplotypes of two single nucleotide polymorphisms (SNPs), namely +45T>G (rs2241766) and +276G>T (rs1501299), some authors found that the SNP +276 TT genotype was significantly associated with protection against HDCP, when compared to the pooled G genotypes[6]. Moreover, the same +276G/T SNP haplotype was strongly associated with biliary atresia, an intractable neonatal inflammatory and obliterative cholangiopathy, leading to progressive fibrosis and cirrhosis[7]. CAD is closely related to adiponectin biology. The same isoforms of adiponectin seem to be not associated to CAD severity but to glucose metabolism and its impairment[8]. In the paper by Mohammadzadeh et al.[1], T allele in +276G/T SNP haplotype is highly associated with CAD in subjects with type 2 diabetes, but this linkage should be reappraised if related much more to diabetes rather than CAD. Association of T allele in the indicated SNP with CAD may be an indirect consequence of type 2 diabetes, as reported by others[9] or a direct marker for CAD affected patients[10]. The paper by Mohammadzadeh et al.[1] assesses data coming elsewhere from literature but raises important concerns about the suitability of ADIPOQ SNPs in diagnosing susceptibility to CAD and the relationship with plasma adiponectin level. In normal, non diabetic, normoglycemic subject, this relationship does not seem to work. Therefore the question is how much predictive this SNP haplotype may be to foresee metabolic syndrome and CAD onset risk in young health subjects? Maybe, the role of adiponectin in cardiovascular physiology depends on its ability to target adiponectin receptors and to negatively regulate obesity. Some authors reported in healthy volunteers an absence of correlation between circulating adiponectin levels and biochemical markers, particularly lipoproteins and suggested that SNP +276G>T was related to an independent effect on adiponectin levels and on lipoprotein metabolism[11]. On the contrary, adiponectin genetic variants and SNP +276G>T was associated with increasing susceptibility of type 2 diabetes and plasma glucose impairment[12]. The interesting study by Mohammadzadeh et al.[1] suggests that SNP of ADIPOQ +276G>T should be related to susceptibility to glucose metabolism, while indirectly to lipid metabolism and fat-related cardiovascular damage.

Project description:BackgroundCroatia is a geographically small country with a remarkable diversity of cultivated and spontaneous grapevines. Local germplasm has been characterised by microsatellite markers, but a detailed analysis based on single nucleotide polymorphisms (SNPs) is still lacking. Here we characterize the genetic diversity of 149 accessions from three germplasm repositories and four natural sites using 516,101 SNPs to identify complete parent-offspring trios and their relations with spontaneous populations, offering a proof-of-concept for the use of reduced-representation genome sequencing in population genetics and genome-wide association studies (GWAS).ResultsPrincipal component analysis revealed a clear discontinuity between cultivated (V. vinifera subsp. sativa) and spontaneous grapevines, supporting the notion that the latter represent local populations of the wild progenitor (V. vinifera subsp. sylvestris). ADMIXTURE identified three ancestry components. Two sativa components are alternatively predominant in cultivars grown either in northern Adriatic Croatia and Continental Croatia or in Dalmatia (i.e. central and southern Adriatic Croatia). A sylvestris component, which is predominant in accessions from spontaneous populations, is a minor ancestry component in cultivated accessions. TREEMIX provided evidence of unidirectional migration from the vineyards to natural sites, suggesting that gene flow has gone preferentially from the introduced domesticated germplasm into local wild populations rather than vice versa. Identity-by-descent analysis indicated an extensive kinship network, including 14 complete parent-offspring trios, involving only cultivated accessions, six full-sibling relationships and invalidated a presumed pedigree of one of the most important varieties in Croatia, 'Plavac Mali'. Despite this strong population structure, significant association was found between 143 SNPs and berry skin colour and between 2 SNPs and leaf hairiness, across two previously known genomic regions.ConclusionsThe clear genetic separation between Croatian cultivars and sylvestris ruled out the hypothesis that those cultivars originated from local domestication events. On the other hand, the evidence of a crop-to-wild gene flow signals the need for an urgent adoption of conservation strategies that preserve the residual genetic integrity of wild relatives. The use of this reduced-representation genome sequencing protocol in grapevine enables an accurate pedigree reconstruction and can be recommended for GWAS experiments.

Dataset Information

Single nucleotide polymorphism (SNP)-strings: an alternative method for assessing genetic associations.

Background

Methodology/principal findings

Conclusions/significance

Publications

Single nucleotide polymorphism (SNP)-strings: an alternative method for assessing genetic associations.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets