Project description:In principle, whole-genome sequencing (WGS) of the human genome even at low coverage offers higher resolution for genomic copy number variation (CNV) detection compared to array-based technologies, which is currently the first-tier approach in clinical cytogenetics. There are, however, obstacles in replacing array-based CNV detection with that of low-coverage WGS such as cost, turnaround time, and lack of systematic performance comparisons. With technological advances in WGS in terms of library preparation, instrument platforms, and data analysis algorithms, obstacles imposed by cost and turnaround time are fading. However, a systematic performance comparison between array and low-coverage WGS-based CNV detection has yet to be performed. Here, we compared the CNV detection capabilities between WGS (short-insert, 3kb-, and 5kb-mate-pair libraries) at 1X, 3X, and 5X coverages and standardly used high-resolution arrays in the genome of 1000-Genomes-Project CEU genome NA12878. CNV detection was performed using standard analysis methods, and the results were then compared to a list of Gold Standard NA12878 CNVs distilled from the 1000-Genomes Project. Overall, low-coverage WGS is able to detect drastically more (approximately 5 fold more on average) Gold Standard CNVs compared to arrays and is accompanied with fewer CNV calls without secondary validation. Furthermore, we also show that WGS (at ≥1X coverage) is able to detect all seven validated deletions larger than 100 kb in the NA12878 genome whereas only one of such deletions is detected in most arrays. Finally, we show that the much larger 15 Mbp Cri-du-chat deletion can be clearly seen at even 1X coverage from short-insert WGS.
2018-06-26 | GSE105092 | GEO
Project description:Remapping of high coverage WGS reads of the 1000 Genomes Project to GRCh38
Project description:Low-pass sequencing (sequencing a genome to an average depth less than 1× coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5× and 1×) and array genotyping (using the Illumina Global Screening Array (GSA)) on 120 DNA samples derived from African and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leave- one-out design. We evaluated overall imputation accuracy from these different assays as well as overall power for GWAS from imputed data, and computed polygenic risk scores for coronary artery disease and breast cancer using previously derived weights. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼ 0.5× and higher compared to the Illumina GSA.
2021-01-31 | GSE165845 | GEO
Project description:Low-coverage WGS
| PRJNA826350 | ENA
Project description:Variant calling on GRCh38 with the 1000 genomes samples
Project description:Whole genome sequencing (WGS) of tongue cancer samples and cell line was performed to identify the fusion gene translocation breakpoint. WGS raw data was aligned to human reference genome (GRCh38.p12) using BWA-MEM (v0.7.17). The BAM files generated were further analysed using SvABA (v1.1.3) tool to identify translocation breakpoints. The translocation breakpoints were annotated using custom scripts, using the reference GENCODE GTF (v30). The fusion breakpoints identified in the SvABA analysis were additionally confirmed using MANTA tool (v1.6.0).