Project description:Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a procedure using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to sub-femtograms of DNA. The amplicons from 5 ng and 0.5 ng DNA, which were from originally good quality of gDNA (05-050), or partially degraded gDNA (04-018), were validate with Illumina HumanHap550-Duo Genotyping Beadchip. As seen in (Suppl. Table 5a), the call rates (97.30% to 99.07%) and accuracy or concordance ( > 99.85% for the SNPs called in both amplicon and natural reference) for 5 ng derived amplicons with both Wpa and Gv2 were close to each other and close to native gDNA (call rate: 98.3% to 99.75%). These call rates were better than a recent report (amplicon 95.9% vs. un-amplified 98.5%), in which the early kit Repli-g 625S was applied, and re-genotyping was performed when the performance was low and duplicate samples were filtered for the highest call rate. The genotyping accuracy of Wpa was actually in the same range as the variation in technical replicates with similar SNP typing arrays (99.87% and 99.88%, replicated Affymetrix array, or between Affymetrix and Illumina arrays). Importantly, the genotyping concordance for amplicons generated from 0.5 ng with Wpa (99.88% and 99.69%) were also close to the technical replicates. In this case, the call rates of Wpa were slightlyreduced compared to that with 5 ng input, but the call rate for the partially degraded sample 04-018, was modestly improved over Gv2 (92.06 % vs. 90.53%). Wpa data also showed some amplification non-uniformity among different locations, resulting in some âartificial CNVsâ similar to Gv2 (exampled as in Suppl. Fig. 5 and Suppl. Table 6), with the outputs obtained by taking unamplified gDNAs as their reference. This imbalance however was consistent and reproducible for each method but different between Wpa and Gv2. These artificial CNVs can be efficiently cancelled if pair-wise amplified test and reference are compared, as observed in CGH result (Fig. 4 and Suppl. Fig. 4), also supported by others {Pugh 2008}. It is interesting to note that the representation of chromosomal terminal sequences was greatly improved with Wpa compared with Gv2 (Fig. 5), and that some of these regions were significantly under-amplified or even lost with Gv2 (Suppl. Fig. 5 and Suppl. Table 6, 7), as also independently reported recently {Pugh 2008}. This occurred especially in the terminal 3 to 5 Mb and sometimes extended to 10 Mb in many chromosome termini, and was particularly serious when low levels or degraded DNA was as input. An analysis for 5 Mb termini is shown (Suppl. Table 5b calculated all involved SNPs as a cohort. Fig. 5 and Suppl. Tables 6 and 7 were the result for each chromosome terminus). Importantly, the SNP typing was also greatly improved, outstandingly exemplified by the amplicons of 0.5 ng input for the partially degraded 04-018, with Wpa versus Gv2 call rate of 91.9% vs. 84.45% and accuracy of 99.57% vs. 98.62%. The result also showed that these terminal regions underrepresentation in Gv2 was not absolutely associated with the distance-to-end, but possibly was a sequence related issue. Keywords: Whole-pool amplification, whole genome SNP typing The overall goal of the part of study was a validation of the quality of the amplicons from different amounts (5ng and 0.5 ng) of original starting gDNA, good quality (sample 05-050) or partially degraded gDNA (sample 04-018), with our new procedure Wpa, and with native gDNA as control, in terms of the call rate and accuracy (allele bias) in addition to the uniformity of the sequence amplified (sequence representation or sequence bias). Amplified or native genomic DNA isolated from patients was in-parallel analyzed/genotyped with the same experimental platform, of which the native genomic DNAs were used as the standard controls. For the sequence representation, the two alleles of the SNPsâ signal of a panel of multiple native DNAsâ signal provided by the experimental platform (Illumina) was used as the reference, so that an abstract signal for sequence representation of each SNP and for all SNPs was obtained.
Project description:Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).We generated sixteen million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias towards higher mapping rates of the allele in the reference sequence, compared to the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, $\sim$5-10\% of SNPs still have an inherent bias towards more effective mapping of one allele. Filtering out inherently biased SNPs removes 40\% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome, and analyzing the simulation output are available upon request from JFD.
Project description:Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).We generated sixteen million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias towards higher mapping rates of the allele in the reference sequence, compared to the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, $\sim$5-10\% of SNPs still have an inherent bias towards more effective mapping of one allele. Filtering out inherently biased SNPs removes 40\% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome, and analyzing the simulation output are available upon request from JFD. RNA-Seq on two YRI Hapmap cell lines. Each individual sequenced on two lanes of the Illumina Genome Analyzer
Project description:Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a procedure using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to sub-femtograms of DNA. The amplicons from 5 ng and 0.5 ng DNA, which were from originally good quality of gDNA (05-050), or partially degraded gDNA (04-018), were validate with Illumina HumanHap550-Duo Genotyping Beadchip. As seen in (Suppl. Table 5a), the call rates (97.30% to 99.07%) and accuracy or concordance ( > 99.85% for the SNPs called in both amplicon and natural reference) for 5 ng derived amplicons with both Wpa and Gv2 were close to each other and close to native gDNA (call rate: 98.3% to 99.75%). These call rates were better than a recent report (amplicon 95.9% vs. un-amplified 98.5%), in which the early kit Repli-g 625S was applied, and re-genotyping was performed when the performance was low and duplicate samples were filtered for the highest call rate. The genotyping accuracy of Wpa was actually in the same range as the variation in technical replicates with similar SNP typing arrays (99.87% and 99.88%, replicated Affymetrix array, or between Affymetrix and Illumina arrays). Importantly, the genotyping concordance for amplicons generated from 0.5 ng with Wpa (99.88% and 99.69%) were also close to the technical replicates. In this case, the call rates of Wpa were slightlyreduced compared to that with 5 ng input, but the call rate for the partially degraded sample 04-018, was modestly improved over Gv2 (92.06 % vs. 90.53%). Wpa data also showed some amplification non-uniformity among different locations, resulting in some “artificial CNVs” similar to Gv2 (exampled as in Suppl. Fig. 5 and Suppl. Table 6), with the outputs obtained by taking unamplified gDNAs as their reference. This imbalance however was consistent and reproducible for each method but different between Wpa and Gv2. These artificial CNVs can be efficiently cancelled if pair-wise amplified test and reference are compared, as observed in CGH result (Fig. 4 and Suppl. Fig. 4), also supported by others {Pugh 2008}. It is interesting to note that the representation of chromosomal terminal sequences was greatly improved with Wpa compared with Gv2 (Fig. 5), and that some of these regions were significantly under-amplified or even lost with Gv2 (Suppl. Fig. 5 and Suppl. Table 6, 7), as also independently reported recently {Pugh 2008}. This occurred especially in the terminal 3 to 5 Mb and sometimes extended to 10 Mb in many chromosome termini, and was particularly serious when low levels or degraded DNA was as input. An analysis for 5 Mb termini is shown (Suppl. Table 5b calculated all involved SNPs as a cohort. Fig. 5 and Suppl. Tables 6 and 7 were the result for each chromosome terminus). Importantly, the SNP typing was also greatly improved, outstandingly exemplified by the amplicons of 0.5 ng input for the partially degraded 04-018, with Wpa versus Gv2 call rate of 91.9% vs. 84.45% and accuracy of 99.57% vs. 98.62%. The result also showed that these terminal regions underrepresentation in Gv2 was not absolutely associated with the distance-to-end, but possibly was a sequence related issue. Keywords: Whole-pool amplification, whole genome SNP typing
Project description:Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a procedure using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to sub-femtograms of DNA. The amplicons from 5 ng and 0.5 ng DNA, which were from originally good quality of gDNA (05-050), or partially degraded gDNA (04-018), faithfully demonstrated all previously known heterozygous segmental duplications and deletions (3 Mb to 18 kb) located on chromosome 22 and even a homozygous deletion smaller than 1 kb with high resolution chromosome-wide CGH. Specifically, HR-CGH with 5 ng-input gDNA-derived amplicon detected all previously known chromosomal segmental aberrations in chromosome 22 in samples from two different probands, and was indistinguishable from the HR-CGH result with native gDNA from the same probands (Fig. 4, Fig. S3 and S4). The break points were also precisely demonstrated. These include a heterozygous genomic segmental duplication (3 copies, 3 Mb in size, sample 05-050, Fig. 4) and 2 different heterozygous deletions (1 copy, 1.4 Mb and 18 kb respectively, sample 04-018, Fig. S4), all of which are located in or bounded by regions of low copy repeats (LCRs). In addition, a previous known homozygous deletion of 975 bp (in 04-018 and 05-050) was again accurately demonstrated (05-050 data showed in Fig. 4c), although sometimes (04-018) the data was a little noisier than with unamplified DNA (Fig. S4d). In contrast, the Wpa-40oC resulted in abundant signal noise and failed in detection of these copy number aberrations (Fig. 4, Fig. S3, S4). Impressively, HR-CGH with 0.5 ng gDNA-derived amplicons via Wpa also clearly detected the known CNVs, although noisier (Fig. S4). The 0.1ng gDNA derived amplicons via Wpa could not unambiguously show CNVs because of higher variability of signals, but the CNVs’ patterns were mostly well maintained (Fig. S4 for 04-018). We did also notice some locus-imbalance in the amplicon, however this was minimized, and was reproducible when the input was above a certain threshold amount, and it could be well compensated if the same amplified reference sample was applied in parallel as showed above. Keywords: Whole-pool amplification, high resolution comparative genome hybridization (HR-CGH)
Project description:Goal: To identify copy number variation in normal individuals using high density, non-polymorphic oligonucleotide probes Background DNA sequence diversity within the human genome may be more greatly affected by copy number variations (CNVs) than single nucleotide polymorphisms (SNPs). Although the importance of CNVs in genome wide association studies (GWAS) is becoming widely accepted, the optimal methods for identifying these variants are still under evaluation. We have previously reported a comprehensive view of CNVs in the HapMap DNA collection using high density 500K EA (Early Access) SNP genotyping arrays which revealed greater than 1,000 CNVs ranging in size from 1kb to over 3Mb. Although the arrays used most commonly for GWAS predominantly interrogate SNPs, CNV identification and detection does not necessarily require the use of DNA probes centered on polymorphic nucleotides and may even be hindered by the dependence on a successful SNP genotyping assay. Results In this study, we have designed and evaluated a high density array predicated on the use of non-polymorphic oligonucleotide probes for CNV detection. This approach effectively uncouples copy number detection from SNP genotyping and thus has the potential to significantly improve probe coverage for genome-wide CNV identification. This array, in conjunction with PCR-based, complexity-reduced DNA target, queries over 1.3M independent NspI restriction enzyme fragments in the 200bp to 1100bp size range, which is a several fold increase in marker density as compared to the 500K EA array. In addition, a novel algorithm was developed and validated to extract CNV regions and boundaries. Conclusions Using a well-characterized pair of DNA samples, close to 200 CNVs were identified, of which nearly 50% appear novel yet were independently validated using quantitative PCR. The results indicate that non-polymorphic probes provide a robust approach for CNV identification, and the increasing precision of CNV boundary delineation should allow a more complete analysis of their genomic organization. A set of five genomic DNA samples containing different numbers of X chromosomes (1X to 5X sample set, including NA15510 and NA10851) were hybridized to Nsp copy number (CN) arrays in triplicate to evaluate detection of copy number variation using high density, non-polymorphic oligonucleotide probes. 6 Hapmap samples were hybridized to Nsp CN arrays to evaluate Mendelian inheritance of CNVs.
Project description:Goal: To identify copy number variation in normal individuals using high density, non-polymorphic oligonucleotide probes Background DNA sequence diversity within the human genome may be more greatly affected by copy number variations (CNVs) than single nucleotide polymorphisms (SNPs). Although the importance of CNVs in genome wide association studies (GWAS) is becoming widely accepted, the optimal methods for identifying these variants are still under evaluation. We have previously reported a comprehensive view of CNVs in the HapMap DNA collection using high density 500K EA (Early Access) SNP genotyping arrays which revealed greater than 1,000 CNVs ranging in size from 1kb to over 3Mb. Although the arrays used most commonly for GWAS predominantly interrogate SNPs, CNV identification and detection does not necessarily require the use of DNA probes centered on polymorphic nucleotides and may even be hindered by the dependence on a successful SNP genotyping assay. Results In this study, we have designed and evaluated a high density array predicated on the use of non-polymorphic oligonucleotide probes for CNV detection. This approach effectively uncouples copy number detection from SNP genotyping and thus has the potential to significantly improve probe coverage for genome-wide CNV identification. This array, in conjunction with PCR-based, complexity-reduced DNA target, queries over 1.3M independent NspI restriction enzyme fragments in the 200bp to 1100bp size range, which is a several fold increase in marker density as compared to the 500K EA array. In addition, a novel algorithm was developed and validated to extract CNV regions and boundaries. Conclusions Using a well-characterized pair of DNA samples, close to 200 CNVs were identified, of which nearly 50% appear novel yet were independently validated using quantitative PCR. The results indicate that non-polymorphic probes provide a robust approach for CNV identification, and the increasing precision of CNV boundary delineation should allow a more complete analysis of their genomic organization. Keywords: Copy number variation (CNV) detection
Project description:Single-cell human genome analysis using whole-genome amplified product is hampered by allele bias during amplification. Using an oligonucleotide SNP array, we examined the nature of the allele bias and its effect on the chromosomal copy number analysis. Keywords: single cell, copy number analysis, whole genome amplification, brain