Project description:Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a procedure using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to sub-femtograms of DNA. The amplicons from 5 ng and 0.5 ng DNA, which were from originally good quality of gDNA (05-050), or partially degraded gDNA (04-018), were validate with Illumina HumanHap550-Duo Genotyping Beadchip. As seen in (Suppl. Table 5a), the call rates (97.30% to 99.07%) and accuracy or concordance ( > 99.85% for the SNPs called in both amplicon and natural reference) for 5 ng derived amplicons with both Wpa and Gv2 were close to each other and close to native gDNA (call rate: 98.3% to 99.75%). These call rates were better than a recent report (amplicon 95.9% vs. un-amplified 98.5%), in which the early kit Repli-g 625S was applied, and re-genotyping was performed when the performance was low and duplicate samples were filtered for the highest call rate. The genotyping accuracy of Wpa was actually in the same range as the variation in technical replicates with similar SNP typing arrays (99.87% and 99.88%, replicated Affymetrix array, or between Affymetrix and Illumina arrays). Importantly, the genotyping concordance for amplicons generated from 0.5 ng with Wpa (99.88% and 99.69%) were also close to the technical replicates. In this case, the call rates of Wpa were slightlyreduced compared to that with 5 ng input, but the call rate for the partially degraded sample 04-018, was modestly improved over Gv2 (92.06 % vs. 90.53%). Wpa data also showed some amplification non-uniformity among different locations, resulting in some “artificial CNVs” similar to Gv2 (exampled as in Suppl. Fig. 5 and Suppl. Table 6), with the outputs obtained by taking unamplified gDNAs as their reference. This imbalance however was consistent and reproducible for each method but different between Wpa and Gv2. These artificial CNVs can be efficiently cancelled if pair-wise amplified test and reference are compared, as observed in CGH result (Fig. 4 and Suppl. Fig. 4), also supported by others {Pugh 2008}. It is interesting to note that the representation of chromosomal terminal sequences was greatly improved with Wpa compared with Gv2 (Fig. 5), and that some of these regions were significantly under-amplified or even lost with Gv2 (Suppl. Fig. 5 and Suppl. Table 6, 7), as also independently reported recently {Pugh 2008}. This occurred especially in the terminal 3 to 5 Mb and sometimes extended to 10 Mb in many chromosome termini, and was particularly serious when low levels or degraded DNA was as input. An analysis for 5 Mb termini is shown (Suppl. Table 5b calculated all involved SNPs as a cohort. Fig. 5 and Suppl. Tables 6 and 7 were the result for each chromosome terminus). Importantly, the SNP typing was also greatly improved, outstandingly exemplified by the amplicons of 0.5 ng input for the partially degraded 04-018, with Wpa versus Gv2 call rate of 91.9% vs. 84.45% and accuracy of 99.57% vs. 98.62%. The result also showed that these terminal regions underrepresentation in Gv2 was not absolutely associated with the distance-to-end, but possibly was a sequence related issue. Keywords: Whole-pool amplification, whole genome SNP typing
Project description:Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a procedure using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to sub-femtograms of DNA. The amplicons from 5 ng and 0.5 ng DNA, which were from originally good quality of gDNA (05-050), or partially degraded gDNA (04-018), were validate with Illumina HumanHap550-Duo Genotyping Beadchip. As seen in (Suppl. Table 5a), the call rates (97.30% to 99.07%) and accuracy or concordance ( > 99.85% for the SNPs called in both amplicon and natural reference) for 5 ng derived amplicons with both Wpa and Gv2 were close to each other and close to native gDNA (call rate: 98.3% to 99.75%). These call rates were better than a recent report (amplicon 95.9% vs. un-amplified 98.5%), in which the early kit Repli-g 625S was applied, and re-genotyping was performed when the performance was low and duplicate samples were filtered for the highest call rate. The genotyping accuracy of Wpa was actually in the same range as the variation in technical replicates with similar SNP typing arrays (99.87% and 99.88%, replicated Affymetrix array, or between Affymetrix and Illumina arrays). Importantly, the genotyping concordance for amplicons generated from 0.5 ng with Wpa (99.88% and 99.69%) were also close to the technical replicates. In this case, the call rates of Wpa were slightlyreduced compared to that with 5 ng input, but the call rate for the partially degraded sample 04-018, was modestly improved over Gv2 (92.06 % vs. 90.53%). Wpa data also showed some amplification non-uniformity among different locations, resulting in some âartificial CNVsâ similar to Gv2 (exampled as in Suppl. Fig. 5 and Suppl. Table 6), with the outputs obtained by taking unamplified gDNAs as their reference. This imbalance however was consistent and reproducible for each method but different between Wpa and Gv2. These artificial CNVs can be efficiently cancelled if pair-wise amplified test and reference are compared, as observed in CGH result (Fig. 4 and Suppl. Fig. 4), also supported by others {Pugh 2008}. It is interesting to note that the representation of chromosomal terminal sequences was greatly improved with Wpa compared with Gv2 (Fig. 5), and that some of these regions were significantly under-amplified or even lost with Gv2 (Suppl. Fig. 5 and Suppl. Table 6, 7), as also independently reported recently {Pugh 2008}. This occurred especially in the terminal 3 to 5 Mb and sometimes extended to 10 Mb in many chromosome termini, and was particularly serious when low levels or degraded DNA was as input. An analysis for 5 Mb termini is shown (Suppl. Table 5b calculated all involved SNPs as a cohort. Fig. 5 and Suppl. Tables 6 and 7 were the result for each chromosome terminus). Importantly, the SNP typing was also greatly improved, outstandingly exemplified by the amplicons of 0.5 ng input for the partially degraded 04-018, with Wpa versus Gv2 call rate of 91.9% vs. 84.45% and accuracy of 99.57% vs. 98.62%. The result also showed that these terminal regions underrepresentation in Gv2 was not absolutely associated with the distance-to-end, but possibly was a sequence related issue. Keywords: Whole-pool amplification, whole genome SNP typing The overall goal of the part of study was a validation of the quality of the amplicons from different amounts (5ng and 0.5 ng) of original starting gDNA, good quality (sample 05-050) or partially degraded gDNA (sample 04-018), with our new procedure Wpa, and with native gDNA as control, in terms of the call rate and accuracy (allele bias) in addition to the uniformity of the sequence amplified (sequence representation or sequence bias). Amplified or native genomic DNA isolated from patients was in-parallel analyzed/genotyped with the same experimental platform, of which the native genomic DNAs were used as the standard controls. For the sequence representation, the two alleles of the SNPsâ signal of a panel of multiple native DNAsâ signal provided by the experimental platform (Illumina) was used as the reference, so that an abstract signal for sequence representation of each SNP and for all SNPs was obtained.
Project description:Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a procedure using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to sub-femtograms of DNA. The amplicons from 5 ng and 0.5 ng DNA, which were from originally good quality of gDNA (05-050), or partially degraded gDNA (04-018), faithfully demonstrated all previously known heterozygous segmental duplications and deletions (3 Mb to 18 kb) located on chromosome 22 and even a homozygous deletion smaller than 1 kb with high resolution chromosome-wide CGH. Specifically, HR-CGH with 5 ng-input gDNA-derived amplicon detected all previously known chromosomal segmental aberrations in chromosome 22 in samples from two different probands, and was indistinguishable from the HR-CGH result with native gDNA from the same probands (Fig. 4, Fig. S3 and S4). The break points were also precisely demonstrated. These include a heterozygous genomic segmental duplication (3 copies, 3 Mb in size, sample 05-050, Fig. 4) and 2 different heterozygous deletions (1 copy, 1.4 Mb and 18 kb respectively, sample 04-018, Fig. S4), all of which are located in or bounded by regions of low copy repeats (LCRs). In addition, a previous known homozygous deletion of 975 bp (in 04-018 and 05-050) was again accurately demonstrated (05-050 data showed in Fig. 4c), although sometimes (04-018) the data was a little noisier than with unamplified DNA (Fig. S4d). In contrast, the Wpa-40oC resulted in abundant signal noise and failed in detection of these copy number aberrations (Fig. 4, Fig. S3, S4). Impressively, HR-CGH with 0.5 ng gDNA-derived amplicons via Wpa also clearly detected the known CNVs, although noisier (Fig. S4). The 0.1ng gDNA derived amplicons via Wpa could not unambiguously show CNVs because of higher variability of signals, but the CNVs’ patterns were mostly well maintained (Fig. S4 for 04-018). We did also notice some locus-imbalance in the amplicon, however this was minimized, and was reproducible when the input was above a certain threshold amount, and it could be well compensated if the same amplified reference sample was applied in parallel as showed above. Keywords: Whole-pool amplification, high resolution comparative genome hybridization (HR-CGH)
Project description:Sequencing reads overlapping polymorphic sites in diploid mammalian genomes may be assigned to one allele or the other. This holds the potential to detect gene expression, chromatin modifications, DNA methylation or nuclear interactions in an allele-specific fashion. SNPsplit is an allele-specific alignment sorter designed to read files in SAM/BAM format and determine the allelic origin of reads or read-pairs that cover known single nucleotide polymorphic (SNP) positions. For this to work libraries must have been aligned to a genome in which all known SNP positions were masked with the ambiguity base 'N' and aligned using a suitable mapping program such as Bowtie2, TopHat, STAR, HISAT2, HiCUP or Bismark. SNPsplit also provides an automated solution to generate N-masked reference genomes for hybrid mouse strains based on the variant call information provided by the Mouse Genomes Project. The unique ability of SNPsplit to work with various different kinds of sequencing data including RNA-Seq, ChIP-Seq, Bisulfite-Seq or Hi-C opens new avenues for the integrative exploration of allele-specific data.
Project description:Large-scale pharmacogenetics and complex disease association studies will require typing of thousands of single-nucleotide polymorphisms (SNPs) in thousands of individuals. Such projects would benefit from a genotyping system with accuracy >99% and a failure rate <5% on a simple, reliable, and flexible platform. However, such a system is not yet available for routine laboratory use. We have evaluated a modification of the previously reported Invader SNP-typing chemistry for use in a genotyping laboratory and tested its automation. The Invader technology uses a Flap Endonuclease for allele discrimination and a universal fluorescence resonance energy transfer (FRET) reporter system. Three hundred and eighty-four individuals were genotyped across a panel of 36 SNPs and one insertion/deletion polymorphism with Invader assays using PCR product as template, a total of 14,208 genotypes. An average failure rate of 2.3% was recorded, mostly associated with PCR failure, and the typing was 99.2% accurate when compared with genotypes generated with established techniques. An average signal-to-noise ratio (9:1) was obtained. The high degree of discrimination for single base changes, coupled with homogeneous format, has allowed us to deploy liquid handling robots in a 384-well microtitre plate format and an automated end-point capture of fluorescent signal. Simple semiautomated data interpretation allows the generation of approximately 25,000 genotypes per person per week, which is 10-fold greater than gel-based SNP typing and microsatellite typing in our laboratory. Savings on labor costs are considerable. We conclude that Invader chemistry using PCR products as template represents a useful technology for typing large numbers of SNPs rapidly and efficiently.
Project description:High resolution melting (HRM) analysis is gaining prominence as a method for discriminating DNA sequence variants. Its advantage is that it is performed in a real-time PCR device, and the PCR amplification and HRM analysis are closed tube, and effectively single step. We have developed an HRM-based method for Staphylococcus aureus genotyping. Eight single nucleotide polymorphisms (SNPs) were derived from the S. aureus multi-locus sequence typing (MLST) database on the basis of maximized Simpson's Index of Diversity. Only G↔A, G↔T, C↔A, C↔T SNPs were considered for inclusion, to facilitate allele discrimination by HRM. In silico experiments revealed that DNA fragments incorporating the SNPs give much higher resolving power than randomly selected fragments. It was shown that the predicted optimum fragment size for HRM analysis was 200 bp, and that other SNPs within the fragments contribute to the resolving power. Six DNA fragments ranging from 83 bp to 219 bp, incorporating the resolution optimized SNPs were designed. HRM analysis of these fragments using 94 diverse S. aureus isolates of known sequence type or clonal complex (CC) revealed that sequence variants are resolved largely in accordance with G+C content. A combination of experimental results and in silico prediction indicates that HRM analysis resolves S. aureus into 268 "melt types" (MelTs), and provides a Simpson's Index of Diversity of 0.978 with respect to MLST. There is a high concordance between HRM analysis and the MLST defined CCs. We have generated a Microsoft Excel key which facilitates data interpretation and translation between MelT and MLST data. The potential of this approach for genotyping other bacterial pathogens was investigated using a computerized approach to estimate the densities of SNPs with unlinked allelic states. The MLST databases for all species tested contained abundant unlinked SNPs, thus suggesting that high resolving power is not dependent upon large numbers of SNPs.
Project description:The amount of genetic variation discovered in human populations is growing rapidly leading to challenging computational tasks, such as variant calling. Standard methods for addressing this problem include read mapping, a computationally expensive procedure; thus, mapping-free tools have been proposed in recent years. These tools focus on isolated, biallelic SNPs, providing limited support for multi-allelic SNPs and short insertions and deletions of nucleotides (indels). Here we introduce MALVA, a mapping-free method to genotype an individual from a sample of reads. MALVA is the first mapping-free tool able to genotype multi-allelic SNPs and indels, even in high-density genomic regions, and to effectively handle a huge number of variants. MALVA requires one order of magnitude less time to genotype a donor than alignment-based pipelines, providing similar accuracy. Remarkably, on indels, MALVA provides even better results than the most widely adopted variant discovery tools.
Project description:MotivationComputational identification of copy number variants (CNVs) in sequencing data is a challenging task. Existing CNV-detection methods account for various sources of variation and perform different normalization strategies. However, their applicability and predictions are restricted to specific enrichment protocols. Here, we introduce a novel tool named varAmpliCNV, specifically designed for CNV-detection in amplicon-based targeted resequencing data (Haloplex™ enrichment protocol) in the absence of matched controls. VarAmpliCNV utilizes principal component analysis (PCA) and/or metric dimensional scaling (MDS) to control variances of amplicon associated read counts enabling effective detection of CNV signals.ResultsPerformance of VarAmpliCNV was compared against three existing methods (ConVaDING, ONCOCNV and DECoN) on data of 167 samples run with an aortic aneurysm gene panel (n = 30), including 9 positive control samples. Additionally, we validated the performance on a large deafness gene panel (n = 145) run on 138 samples, containing 4 positive controls. VarAmpliCNV achieved higher sensitivity (100%) and specificity (99.78%) in comparison to competing methods. In addition, unsupervised clustering of CNV segments and visualization plots of amplicons spanning these regions are included as a downstream strategy to filter out false positives.Availability and implementationThe tool is freely available through galaxy toolshed and at: https://hub.docker.com/r/cmgantwerpen/varamplicnv. Supplementary Data File S1: https://tinyurl.com/2yzswyhh; Supplementary Data File S2: https://tinyurl.com/ycyf2fb4.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BackgroundSingle nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) are the most common type of polymorphisms and are frequently used for molecular marker development. Such markers have become very popular for all kinds of genetic analysis, including haplotype reconstruction. Haplotypes can be reconstructed for whole chromosomes but also for specific genes, based on the SNPs present. Haplotypes in the latter context represent the different alleles of a gene. The computational approach to SNP mining is becoming increasingly popular because of the continuously increasing number of sequences deposited in databases, which allows a more accurate identification of SNPs. Several software packages have been developed for SNP mining from databases. From these, QualitySNP is the only tool that combines SNP detection with the reconstruction of alleles, which results in a lower number of false positive SNPs and also works much faster than other programs. We have build a web-based SNP discovery and allele detection tool (HaploSNPer) based on QualitySNP.ResultsHaploSNPer is a flexible web-based tool for detecting SNPs and alleles in user-specified input sequences from both diploid and polyploid species. It includes BLAST for finding homologous sequences in public EST databases, CAP3 or PHRAP for aligning them, and QualitySNP for discovering reliable allelic sequences and SNPs. All possible and reliable alleles are detected by a mathematical algorithm using potential SNP information. Reliable SNPs are then identified based on the reconstructed alleles and on sequence redundancy.ConclusionThorough testing of HaploSNPer (and the underlying QualitySNP algorithm) has shown that EST information alone is sufficient for the identification of alleles and that reliable SNPs can be found efficiently. Furthermore, HaploSNPer supplies a user friendly interface for visualization of SNP and alleles. HaploSNPer is available from http://www.bioinformatics.nl/tools/haplosnper/.
Project description:BackgroundSingle nucleotide polymorphisms (SNPs) have been associated with many aspects of human development and disease, and many non-coding SNPs associated with disease risk are presumed to affect gene regulation. We have previously shown that SNPs within transcription factor binding sites can affect transcription factor binding in an allele-specific and heritable manner. However, such analysis has relied on prior whole-genome genotypes provided by large external projects such as HapMap and the 1000 Genomes Project. This requirement limits the study of allele-specific effects of SNPs in primary patient samples from diseases of interest, where complete genotypes are not readily available.ResultsIn this study, we show that we are able to identify SNPs de novo and accurately from ChIP-seq data generated in the ENCODE Project. Our de novo identified SNPs from ChIP-seq data are highly concordant with published genotypes. Independent experimental verification of more than 100 sites estimates our false discovery rate at less than 5%. Analysis of transcription factor binding at de novo identified SNPs revealed widespread heritable allele-specific binding, confirming previous observations. SNPs identified from ChIP-seq datasets were significantly enriched for disease-associated variants, and we identified dozens of allele-specific binding events in non-coding regions that could distinguish between disease and normal haplotypes.ConclusionsOur approach combines SNP discovery, genotyping and allele-specific analysis, but is selectively focused on functional regulatory elements occupied by transcription factors or epigenetic marks, and will therefore be valuable for identifying the functional regulatory consequences of non-coding SNPs in primary disease samples.