Project description:Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. Genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs. The current application of PET sequencing with short insert size DNA is insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We have developed a robust procedure to generate PET sequencing data using large DNA inserts of 10 - 20 kb for the identification of SVs. We compared the characteristics of the large insert libraries with short insert (1 kb) libraries with the same sequencing depths and costs. Although short insert libraries bear an advantage in identifying small deletions, they do not provide a significantly better breakpoint resolution. Large inserts are superior to short inserts in providing higher physical genome coverage and therefore achieve greater sensitivity for the identification of the different types of SVs, including copy number neutral and complex events. Further, large inserts allow the identification of SVs within repetitive sequences which cannot be spanned by short inserts. Structural variations of three cancer cell lines using short (1 kb) and long (10 kb and 20 kb) insert size DNA fragments