Project description:Microarray-based enrichment of selected genomic loci is a powerful method for genome complexity reduction. Since the vast majority of exons in vertebrate genomes are smaller than 150 nt, we have explored the use of short fragment libraries (85-110bp) to achieve higher enrichment specificity by reducing carryover and adverse effects of flanking intronic sequences. These short fragment libraries were enriched for 1.69 Mb of exonic sequences, using custom 244K microarrays, and sequenced using AB/SOLiD. High enrichment specificity (60 M-bM-^@M-^S 75%) was obtained at 67-213x average coverage, with 77-92% and 90-98% of targeted regions covered with more than 25% and 10% of the average coverage, respectively. As a more appropriate measure of the evenness of coverage, which is relatively independent of sequencing depth, we introduce the evenness of coverage parameter E. E values up to 75% were achieved. To verify the accuracy of SNP/mutation detection we evaluated 384 known non-reference SNPs in the targeted regions. At ~ 200x average sequence coverage, we were able to survey 96.4% of 1.69 Mb of genomic sequence with only 4.2% false negative calls while 3.6% of targeted regions were marked as unsurveyed. A total of 1197 new variants were detected. Verification revealed only 8 false positive calls, resulting in an overall false positive rate of less than 1 per ~200,000 bp (0.0005%, equivalent to an overall phred score of 55). 4 samples + capture design file
Project description:Microarray-based enrichment of selected genomic loci is a powerful method for genome complexity reduction. Since the vast majority of exons in vertebrate genomes are smaller than 150 nt, we have explored the use of short fragment libraries (85-110bp) to achieve higher enrichment specificity by reducing carryover and adverse effects of flanking intronic sequences. These short fragment libraries were enriched for 1.69 Mb of exonic sequences, using custom 244K microarrays, and sequenced using AB/SOLiD. High enrichment specificity (60 – 75%) was obtained at 67-213x average coverage, with 77-92% and 90-98% of targeted regions covered with more than 25% and 10% of the average coverage, respectively. As a more appropriate measure of the evenness of coverage, which is relatively independent of sequencing depth, we introduce the evenness of coverage parameter E. E values up to 75% were achieved. To verify the accuracy of SNP/mutation detection we evaluated 384 known non-reference SNPs in the targeted regions. At ~ 200x average sequence coverage, we were able to survey 96.4% of 1.69 Mb of genomic sequence with only 4.2% false negative calls while 3.6% of targeted regions were marked as unsurveyed. A total of 1197 new variants were detected. Verification revealed only 8 false positive calls, resulting in an overall false positive rate of less than 1 per ~200,000 bp (0.0005%, equivalent to an overall phred score of 55).
Project description:Low-coverage whole genome of endometrium cancer derived organoids. Organoids were established from patients with endometrial diseases and DNA was extracted from low passage number and high passage number and compared with the primary tissue when available to investigate whether organoids retain the same genomic abnormalities and disease-associated features.
Project description:Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. Genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs. The current application of PET sequencing with short insert size DNA is insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We have developed a robust procedure to generate PET sequencing data using large DNA inserts of 10 - 20 kb for the identification of SVs. We compared the characteristics of the large insert libraries with short insert (1 kb) libraries with the same sequencing depths and costs. Although short insert libraries bear an advantage in identifying small deletions, they do not provide a significantly better breakpoint resolution. Large inserts are superior to short inserts in providing higher physical genome coverage and therefore achieve greater sensitivity for the identification of the different types of SVs, including copy number neutral and complex events. Further, large inserts allow the identification of SVs within repetitive sequences which cannot be spanned by short inserts. Structural variations of three cancer cell lines using short (1 kb) and long (10 kb and 20 kb) insert size DNA fragments