Project description:Polymorphic inversions contribute to adaptation and phenotypic variation. However, large multi-centric association studies of inversions remain challenging. We present scoreInvHap, a method to genotype inversions from SNP data for genome-wide association studies (GWASs), overcoming important limitations of current methods and outperforming them in accuracy and applicability. scoreInvHap calls individual inversion-genotypes from a similarity score to the SNPs of experimentally validated references. It can be used on different sources of SNP data, including those with low SNP coverage such as exome sequencing, and is easily adaptable to genotype new inversions, either in humans or in other species. We present 20 human inversions that can be reliably and easily genotyped with scoreInvHap to discover their role in complex human traits, and illustrate a first genome-wide association study of experimentally-validated human inversions. scoreInvHap is implemented in R and it is freely available from Bioconductor.
Project description:ObjectiveThe admixture of domestic pig into wild boar populations is controlled until now, by cytogenetic analysis. Even if a first-generation hybrid animal is discernable because of its 37-chromosome karyotype, the cytogenetic method is not applicable in the case of advanced intercrosses. The aim of this study is therefore to evaluate the use of SNP (Single Nucleotide Polymorphism) markers as an alternative technology to characterize recent or past hybridization between the two sub-species. The final goal would be to develop a molecular diagnostic tool.Data descriptionThe Geneseek Genomic Profiler High-Density porcine beadchip (GGP70KHD, Illumina, USA), comprising 68,516 porcine SNPs, was used on a set of 362 wild boars with diverse chromosomal statuses collected from different areas and breeding environments in France. We generated approximately 62,192-64,046 genotypes per wild boar. The present dataset might be useful for the community (i) for developing molecular tools to evaluate the admixture of domestic pig into wild boar populations, and (ii) for genetic diversity studies including wild boar species or phylogeny analyses of Suidae populations. Raw data files and a processed matrix data file were deposited in the ArrayExpress at European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) data portal under accession number E-MTAB-10591.
Project description:With the increasing demand for higher throughput single nucleotide polymorphism (SNP) genotyping, the quantity of genomic DNA often falls short of the number of assays required. We investigated the use of degenerate oligonucleotide primed polymerase chain reaction (DOP-PCR) to generate a template for our SNP genotyping methodology of fluorescence polarization template-directed dye-terminator incorporation detection. DOP-PCR employs a degenerate primer (5'-CCGACTCGAGNNNNNNATGTGG-3') to produce non-specific uniform amplification of DNA. This approach has been successfully applied to microsatellite genotyping. We compared genotyping of DOP-PCR-amplified genomic DNA to genomic DNA as a template. Results were analyzed with respect to feasibility, allele loss of alleles, genotyping accuracy and storage conditions in a high-throughput genotyping environment. DOP-PCR yielded overall satisfactory results, with a certain loss in accuracy and quality of the genotype assignments. Accuracy and quality of genotypes generated from the DOP-PCR template also depended on storage conditions. Adding carrier DNA to a final concentration of 10 ng/microl improved results. In conclusion, we have successfully used DOP-PCR to amplify our genomic DNA collection for subsequent SNP genotyping as a standard process.
Project description:IntroductionHeel pricks are performed on newborns for diagnostic screenings of various pre-symptomatic metabolic and genetic diseases. Excess blood is spotted on Guthrie cards and archived by many states in biobanks for follow-up diagnoses and public health research. However, storage environment may vary across biobanks and across time within biobanks. With increased applications of DNA extracted from spots for genetic studies, identifying factors associated with genotyping success is critical to maximize DNA quality for future studies.MethodWe evaluated 399 blood spots, which were part of a genome-wide association study of childhood leukemia risk in children with Down syndrome, archived at the Michigan Neonatal Biobank between 1992 and 2008. High quality DNA was defined as having post-quality control call rate ≥ 99.0% based on the Illumina GenomeStudio 2.0 GenCall algorithm after processing the samples on the Illumina Infinium Global Screening Array. Bivariate analyses and multivariable logistic regression models were applied to evaluate effects of storage environment and storage duration on DNA genotyping quality.ResultsBoth storage environment and duration were associated with sample genotyping call rates (p-values < 0.001). Sample call rates were associated with storage duration independent of storage environment (p-trend = 0.006 for DBS archived in an uncontrolled environment and p-trend = 0.002 in a controlled environment). However, 95% of the total sample had high genotyping quality with a call rate ≥ 95.0%, a standard threshold for acceptable sample quality in many genetic studies.ConclusionBlood spot DNA quality was lower in samples archived in uncontrolled storage environments and for samples archived for longer durations. Still, regardless of storage environment or duration, neonatal biobanks including the Michigan Neonatal Biobanks can provide access to large collections of spots with DNA quality acceptable for most genotyping studies.
Project description:BackgroundMany aspects of transfusion medicine are affected by genetics. Current single-nucleotide polymorphism (SNP) arrays are limited in the number of targets that can be interrogated and cannot detect all variation of interest. We designed a transfusion medicine array (TM-Array) for study of both common and rare transfusion-relevant variations in genetically diverse donor and recipient populations.Study design and methodsThe array was designed by conducting extensive bioinformatics mining and consulting experts to identify genes and genetic variation related to a wide range of transfusion medicine clinical relevant and research-related topics. Copy number polymorphisms were added in the alpha globin, beta globin, and Rh gene clusters.ResultsThe final array contains approximately 879,000 SNP and copy number polymorphism markers. Over 99% of SNPs were called reliably. Technical replication showed the array to be robust and reproducible, with an error rate less than 0.03%. The array also had a very low Mendelian error rate (average parent-child trio accuracy of 0.9997). Blood group results were in concordance with serology testing results, and the array accurately identifies rare variants (minor allele frequency of 0.5%). The array achieved high genome-wide imputation coverage for African-American (97.5%), Hispanic (96.1%), East Asian (94.6%), and white (96.1%) genomes at a minor allele frequency of 5%.ConclusionsA custom array for transfusion medicine research has been designed and evaluated. It gives wide coverage and accurate identification of rare SNPs in diverse populations. The TM-Array will be useful for future genetic studies in the diverse fields of transfusion medicine research.
Project description:The keystone aquatic herbivore Daphnia has been studied for more than 150 years in the context of evolution, ecology and ecotoxicology. Although it is rapidly becoming an emergent model for environmental and population genomics, there have been limited genome-wide level studies in natural populations. We report a unique resource of novel Single Nucleotide Polymorphic (SNP) markers for Daphnia pulicaria using the reduction in genomic complexity with the restriction enzymes approach, genotyping-by-sequencing. Using the genome of D. pulex as a reference, SNPs were scored for 53 clones from five natural populations that varied in lake trophic status. Our analyses resulted in 32,313 highly confident and bi-allelic SNP markers. 1,364 outlier SNPs were mapped on the annotated D. pulex genome, which identified 2,335 genes, including 565 within functional genes. Out of 885 EuKaryotic Orthologous Groups that we found from outlier SNPs, 294 were involved in three metabolic and four regulatory pathways. Bayesian-clustering analyses showed two distinct population clusters representing the possible combined effects of geography and lake trophic status. Our results provide an invaluable tool for future population genomics surveys in Daphnia targeting informative regions related to physiological processes that can be linked to the ecology of this emerging eco-responsive taxon.
Project description:SNP genotyping has emerged as a technology to incorporate copy number variants (CNVs) into genetic analyses of human traits. However, the extent to which SNP platforms accurately capture CNVs remains unclear. Using independent, sequence-based CNV maps, we find that commonly used SNP platforms have limited or no probe coverage for a large fraction of CNVs. Despite this, in 9 samples we inferred 368 CNVs using Illumina SNP genotyping data and experimentally validated over two-thirds of these. We also developed a method (SNP-Conditional Mixture Modeling, SCIMM) to robustly genotype deletions using as few as two SNP probes. We find that HapMap SNPs are strongly correlated with 82% of common deletions, but the newest SNP platforms effectively tag about 50%. We conclude that currently available genome-wide SNP assays can capture CNVs accurately, but improvements in array designs, particularly in duplicated sequences, are necessary to facilitate more comprehensive analyses of genomic variation.
Project description:Genotyping-by-Sequencing (GBS) is an excellent tool for characterising genetic variation between plant genomes. To date, its use has been reported only for genotyping of single individuals. However, there are many applications where resolving allele frequencies within populations on a genome-wide scale would be very powerful, examples include the breeding of outbreeding species, varietal protection in outbreeding species, monitoring changes in population allele frequencies. This motivated us to test the potential to use GBS to evaluate allele frequencies within populations. Perennial ryegrass is an outbreeding species, and breeding programs are based upon selection on populations. We tested two restriction enzymes for their efficiency in complexity reduction of the perennial ryegrass genome. The resulting profiles have been termed Genome Wide Allele Frequency Fingerprints (GWAFFs), and we have shown how these fingerprints can be used to distinguish between plant populations. Even at current costs and throughput, using sequencing to directly evaluate populations on a genome-wide scale is viable. GWAFFs should find many applications, from varietal development in outbreeding species right through to playing a role in protecting plant breeders' rights.