Project description:The Long-read POG dataset comprises a cohort of 189 patient tumours and 41 matched normal samples sequenced using the Oxford Nanopore Technologies PromethION platform. This dataset from the Personalized Oncogenomics (POG) program and the Marathon of Hope Cancer Centres Network includes accompanying DNA and RNA short-read sequence data, analytics, and clinical information. We show the potential of long-read sequencing for resolving complex cancer-related structural variants, viral integrations, and extrachromosomal circular DNA. Long-range phasing of variants facilitates the discovery of allelically differentially methylated regions (aDMRs) and allele-specific expression, including recurrent aDMRs in the cancer genes RET and CDKN2A. Germline promoter methylation in MLH1 can be directly observed in Lynch syndrome. Promoter methylation in BRCA1 and RAD51C is a likely driver behind patterns of homologous recombination deficiency where no driver mutation was found. This dataset demonstrates applications for long-read sequencing in precision medicine, and is available as a resource for developing analytical approaches using this technology.
Project description:Human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used orthogonal long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. For non-recurrent events we found microhomology and microhomeology at the breakpoint junctions, an excess of deletion rearrangements on paternally-derived haplotypes, and elucidated recalcitrant breakpoints. Our data indicate an increased rate of clustered single nucleotide variant mutation in cis that is not present with recurrent rearrangement of the genome at the same locus. Indel and single nucleotide mutations are associated with both copy number gains and losses of 17p11.2, occur up to ~1 Mb away from the breakpoint junctions, and favor C>G transversion substitutions; results suggesting that single stranded DNA is formed during the genesis of the SV and providing compelling support for a microhomology-mediated break-induced replication mechanism for SV formation.
Project description:Human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used orthogonal long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. For non-recurrent events we found microhomology and microhomeology at the breakpoint junctions, an excess of deletion rearrangements on paternally-derived haplotypes, and elucidated recalcitrant breakpoints. Our data indicate an increased rate of clustered single nucleotide variant mutation in cis that is not present with recurrent rearrangement of the genome at the same locus. Indel and single nucleotide mutations are associated with both copy number gains and losses of 17p11.2, occur up to ~1 Mb away from the breakpoint junctions, and favor C>G transversion substitutions; results suggesting that single stranded DNA is formed during the genesis of the SV and providing compelling support for a microhomology-mediated break-induced replication mechanism for SV formation.
Project description:Human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used orthogonal long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. For non-recurrent events we found microhomology and microhomeology at the breakpoint junctions, an excess of deletion rearrangements on paternally-derived haplotypes, and elucidated recalcitrant breakpoints. Our data indicate an increased rate of clustered single nucleotide variant mutation in cis that is not present with recurrent rearrangement of the genome at the same locus. Indel and single nucleotide mutations are associated with both copy number gains and losses of 17p11.2, occur up to ~1 Mb away from the breakpoint junctions, and favor C>G transversion substitutions; results suggesting that single stranded DNA is formed during the genesis of the SV and providing compelling support for a microhomology-mediated break-induced replication mechanism for SV formation.
Project description:<p>Recently developed methods that utilize partitioning of long genomic DNA fragments, and barcoding of shorter fragments derived from them, have succeeded in retaining long-range information in short sequencing reads. These so-called read cloud approaches represent a powerful, accurate, and cost-effective alternative to single-molecule long-read sequencing. We developed software, GROC-SVs, that takes advantage of read clouds for structural variant detection and assembly. We apply the method to two 10x Genomics data sets, one chromothriptic sarcoma with several spatially separated samples, and one breast cancer cell line, all Illumina-sequenced to high coverage. Comparison to short-fragment data from the same samples, and validation by mate-pair data from a subset of the sarcoma samples, demonstrate substantial improvement in specificity of breakpoint detection compared to short-fragment sequencing, at comparable sensitivity, and vice versa. The embedded long-range information also facilitates sequence assembly of a large fraction of the breakpoints; importantly, consecutive breakpoints that are closer than the average length of the input DNA molecules can be assembled together and their order and arrangement reconstructed, with some events exhibiting remarkable complexity. These features facilitated an analysis of the structural evolution of the sarcoma. In the chromothripsis, rearrangements occurred before copy number amplifications, and using the phylogenetic tree built from point mutation data, we show that single nucleotide variants and structural variants are not correlated. We predict significant future advances in structural variant science using 10x data analyzed with GROC-SVs and other read cloud-specific methods.</p>
Project description:Germline structural variants (SVs) are challenging to identify by conventional genetic testing assays. Long-read sequencing has improved the global characterization of SVs, but its sensitivity at genetic loci associated with high- and moderate-penetrance cancer susceptibility has not been reported. This study used long-read genome sequencing performed on the Oxford Nanopore Technologies' PromethION to resolve variants underlying breast cancer susceptibility in sixteen individuals with pathogenic germline SVs in BRCA1, BRCA2, CHEK2 or PALB2.
Project description:Chromoanagenesis is a descriptive term that encompasses classes of catastrophic mutagenic processes that generate localized and complex chromosome rearrangements in both somatic and germline genomes. Herein we describe a 5-year-old female presenting with a constellation of clinical features consistent with a clinical diagnosis of Coffin-Siris syndrome 1 (CSS1). Initial G-banded karyotyping detected a 90 Mb pericentric and 47 Mb paracentric inversion on a single chromosome. Subsequent analysis using short-read whole genome sequencing, and genomic optical mapping revealed additional inversions, all clustered on chromosome 6, one of them disrupting ARID1B for which haploinsufficiency leading to CSS1. In all, the resolved derivative chromosome architecture presents four de novo inversions, one pericentric and three paracentric, involving six breakpoint junctions in what appears to be a shuffling of genomic material on this chromosome. Each junction was resolved to nucleotide-level resolution with mutational signatures suggestive of non-homologous end joining. The disruption of the gene ARID1B is shown to occur between the 4th and 5th exon of the canonical transcript with subsequent qPCR studies confirming a decrease in ARID1B expression in the patient versus healthy controls. Deciphering the underlying genomic architecture of chromosomal rearrangements and complex structural variants may require multiple technologies and can be critical to elucidating the molecular etiology of a patient’s clinical phenotype or resolving unsolved Mendelian disease cases.
Project description:RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon-junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap - 5kb), large (Nebulin - 22 kb) and very-large (Titin - 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type specific differential expression of these novel transcripts. The improved transcript identification and quantification demonstrated by our approach removes previous impediments to studies aimed at quantitative differential expression of ultra-long transcripts.
Project description:We evaluated linked-read whole genome sequencing (WGS) for detection of structural chromosomal rearrangements in primary samples of varying DNA quality from 12 patients diagnosed with ALL. Linked-read WGS enabled precise, allele-specific, digital karyotyping at a base-pair resolution for a wide range of structural variants including complex rearrangements, aneuploidy assessment and gene deletions. Additional RNA-sequencing and copy number aberrations (CNA) data from Illumina Infinium arrays were also generated and assessed against the linked-read WGS data. RNA-sequencing data was used to support structural chromosomal rearrangements detected in the linked-read WGS data by detecting expressed fusion genes as a consequence of the rearrangements. Illumina Infinium arrays (450k array and/or SNP array) were used to assess CNA status to further support the findings in the linked-read WGS data. The processed CNA data from the primary ALL patient samples has been deposited to GEO. RNA-sequencing, linked-read WGS data, and raw SNP array data from the primary ALL patient samples will not be deposited because the patient/parent consent does not cover depositing data that may be used for large-scale determination of germline variants in a repository. The ALL samples were collected 10-20 years ago from pediatric patients aged 2-15 years, some whom have deceased. The linked-read WGS data and the RNA-sequencing data sets generated in the study are available upon reasonable request from the corresponding author Jessica.Nordlund@medsci.uu.se.