Project description:We evaluated linked-read whole genome sequencing (WGS) for detection of structural chromosomal rearrangements in primary samples of varying DNA quality from 12 patients diagnosed with ALL. Linked-read WGS enabled precise, allele-specific, digital karyotyping at a base-pair resolution for a wide range of structural variants including complex rearrangements, aneuploidy assessment and gene deletions. Additional RNA-sequencing and copy number aberrations (CNA) data from Illumina Infinium arrays were also generated and assessed against the linked-read WGS data. RNA-sequencing data was used to support structural chromosomal rearrangements detected in the linked-read WGS data by detecting expressed fusion genes as a consequence of the rearrangements. Illumina Infinium arrays (450k array and/or SNP array) were used to assess CNA status to further support the findings in the linked-read WGS data. The processed CNA data from the primary ALL patient samples has been deposited to GEO. RNA-sequencing, linked-read WGS data, and raw SNP array data from the primary ALL patient samples will not be deposited because the patient/parent consent does not cover depositing data that may be used for large-scale determination of germline variants in a repository. The ALL samples were collected 10-20 years ago from pediatric patients aged 2-15 years, some whom have deceased. The linked-read WGS data and the RNA-sequencing data sets generated in the study are available upon reasonable request from the corresponding author Jessica.Nordlund@medsci.uu.se.
Project description:We evaluated linked-read whole genome sequencing (WGS) for detection of structural chromosomal rearrangements in primary samples of varying DNA quality from 12 patients diagnosed with ALL. Linked-read WGS enabled precise, allele-specific, digital karyotyping at a base-pair resolution for a wide range of structural variants including complex rearrangements, aneuploidy assessment and gene deletions. Additional RNA-sequencing and copy number aberrations (CNA) data from Illumina Infinium arrays were also generated and assessed against the linked-read WGS data. RNA-sequencing data was used to support structural chromosomal rearrangements detected in the linked-read WGS data by detecting expressed fusion genes as a consequence of the rearrangements. Illumina Infinium arrays (450k array and/or SNP array) were used to assess CNA status to further support the findings in the linked-read WGS data. The processed CNA data from the primary ALL patient samples has been deposited to GEO. RNA-sequencing, linked-read WGS data, and raw SNP array data from the primary ALL patient samples will not be deposited because the patient/parent consent does not cover depositing data that may be used for large-scale determination of germline variants in a repository. The ALL samples were collected 10-20 years ago from pediatric patients aged 2-15 years, some whom have deceased. The linked-read WGS data and the RNA-sequencing data sets generated in the study are available upon reasonable request from the corresponding author Jessica.Nordlund@medsci.uu.se.
Project description:We evaluated linked-read whole genome sequencing (WGS) for detection of structural chromosomal rearrangements in primary samples of varying DNA quality from 12 patients diagnosed with ALL. Linked-read WGS enabled precise, allele-specific, digital karyotyping at a base-pair resolution for a wide range of structural variants including complex rearrangements, aneuploidy assessment and gene deletions. Additional RNA-sequencing and copy number aberrations (CNA) data from Illumina Infinium arrays were also generated and assessed against the linked-read WGS data. RNA-sequencing data was used to support structural chromosomal rearrangements detected in the linked-read WGS data by detecting expressed fusion genes as a consequence of the rearrangements. Illumina Infinium arrays (450k array and/or SNP array) were used to assess CNA status to further support the findings in the linked-read WGS data. The processed CNA data from the primary ALL patient samples has been deposited to GEO. RNA-sequencing, linked-read WGS data, and raw SNP array data from the primary ALL patient samples will not be deposited because the patient/parent consent does not cover depositing data that may be used for large-scale determination of germline variants in a repository. The ALL samples were collected 10-20 years ago from pediatric patients aged 2-15 years, some whom have deceased. The linked-read WGS data and the RNA-sequencing data sets generated in the study are available upon reasonable request from the corresponding author Jessica.Nordlund@medsci.uu.se.
Project description:The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR). Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved. Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ~2.2-5.5 kb of 100% nucleotide similarity. These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate. This type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci.
Project description:Neuroblastoma, like many childhood cancers, exhibits a relative paucity of somatic single nucleotide variants (SNVs). Here, we assess the contribution of structural variation (SV) in neuroblastoma using a combination of whole genome sequencing (WGS; n=135) and single nucleotide polymorphism (SNP) genotyping (n=914) of matched tumor-normal pairs. Our study design provided means for orthogonal validation of SVs as well as validation across genomic platforms. SV frequency, type, and localization varied significantly among high-risk tumors, with MYCN non-amplified tumors harboring an increased SV burden overall (P=1.12x10-5). Genes disrupted by SV breakpoints were enriched in neuronal lineages and autism spectrum disorder. The postsynaptic adapter protein-coding gene SHANK2, located on chromosome 11q13, was disrupted by SVs in 14% and 10% of MYCN non-amplified high-risk tumors based on WGS and SNP array cohorts, respectively. Forced expression of SHANK2 in neuroblastoma cell models resulted in significant growth inhibition (P=2.62x10-2 to 3.4x10-5) and accelerated neuronal differentiation following treatment with all-trans retinoic acid (P=3.08x10-13 to 2.38x10-30). These data further define the complex landscape of structural variation in neuroblastoma and suggest that events leading to deregulation of neurodevelopmental processes, such as inactivation of SHANK2, are key mediators of tumorigenesis.
Project description:Genome-wide association studies (GWAS) have been highly informative in discovering disease-associated loci, but are not designed to capture all structural variations in the human genome. Using long-read sequencing data, we discovered widespread structural variation within SVA (Sine-VNTR-Alu) elements, a class of great-ape specific transposable elements with gene-regulatory roles, which represents a major source of structural variability in the human population. We highlight the presence of structurally variable SVAs (SV-SVAs) in neurological disease-associated loci, and further associate SV-SVAs to disease-associated SNPs and differential gene expression using luciferase assays and expression quantitative trait loci data. Finally, we genetically deleted SV-SVAs in the BIN1 and CD2AP Alzheimer-associated risk loci and in the BCKDK Parkinson disease-associated risk locus and assessed multiple aspects of their gene-regulatory influence in a human neuronal context. Together, this study reveals a novel layer of genetic variation in transposable elements that may contribute to identification of the structural variants that are the actual drivers of disease-associations of GWAS loci.
Project description:Whole genome sequencing (WGS) of tongue cancer samples and cell line was performed to identify the fusion gene translocation breakpoint. WGS raw data was aligned to human reference genome (GRCh38.p12) using BWA-MEM (v0.7.17). The BAM files generated were further analysed using SvABA (v1.1.3) tool to identify translocation breakpoints. The translocation breakpoints were annotated using custom scripts, using the reference GENCODE GTF (v30). The fusion breakpoints identified in the SvABA analysis were additionally confirmed using MANTA tool (v1.6.0).
Project description:To unravel the fine architecture of neocentromeres found in three well-differentiated liposarcoma (WDLPS) cell lines as patchworks of multiple short amplified sequences, disclosing a much more higher complexity than previously reported. Next generation sequencing data (WGS, RNA-seq, CENP-A/ChIP-seq) are available at the Sequence Read Archive (BioProject ID: PRJNA378952).