Project description:Deconvolution methods infer quantitative cell type estimates from bulk measurement of mixed samples including blood and tissue. DNA methylation sequencing measures multiple CpGs per read, but few existing deconvolution methods leverage this within-read information. We develop CelFiE-ISH, which extends an existing method (CelFiE) to use within-read haplotype information. CelFiE-ISH outperforms CelFiE and other existing methods, achieving 30% better accuracy and more sensitive detection of rare cell types. We also demonstrate the importance of marker selection and tailoring markers for haplotype-aware methods. While here we use gold-standard short-read sequencing data, haplotype-aware methods will be well-suited for long-read sequencing.
Project description:We evaluated linked-read whole genome sequencing (WGS) for detection of structural chromosomal rearrangements in primary samples of varying DNA quality from 12 patients diagnosed with ALL. Linked-read WGS enabled precise, allele-specific, digital karyotyping at a base-pair resolution for a wide range of structural variants including complex rearrangements, aneuploidy assessment and gene deletions. Additional RNA-sequencing and copy number aberrations (CNA) data from Illumina Infinium arrays were also generated and assessed against the linked-read WGS data. RNA-sequencing data was used to support structural chromosomal rearrangements detected in the linked-read WGS data by detecting expressed fusion genes as a consequence of the rearrangements. Illumina Infinium arrays (450k array and/or SNP array) were used to assess CNA status to further support the findings in the linked-read WGS data. The processed CNA data from the primary ALL patient samples has been deposited to GEO. RNA-sequencing, linked-read WGS data, and raw SNP array data from the primary ALL patient samples will not be deposited because the patient/parent consent does not cover depositing data that may be used for large-scale determination of germline variants in a repository. The ALL samples were collected 10-20 years ago from pediatric patients aged 2-15 years, some whom have deceased. The linked-read WGS data and the RNA-sequencing data sets generated in the study are available upon reasonable request from the corresponding author Jessica.Nordlund@medsci.uu.se.
Project description:We evaluated linked-read whole genome sequencing (WGS) for detection of structural chromosomal rearrangements in primary samples of varying DNA quality from 12 patients diagnosed with ALL. Linked-read WGS enabled precise, allele-specific, digital karyotyping at a base-pair resolution for a wide range of structural variants including complex rearrangements, aneuploidy assessment and gene deletions. Additional RNA-sequencing and copy number aberrations (CNA) data from Illumina Infinium arrays were also generated and assessed against the linked-read WGS data. RNA-sequencing data was used to support structural chromosomal rearrangements detected in the linked-read WGS data by detecting expressed fusion genes as a consequence of the rearrangements. Illumina Infinium arrays (450k array and/or SNP array) were used to assess CNA status to further support the findings in the linked-read WGS data. The processed CNA data from the primary ALL patient samples has been deposited to GEO. RNA-sequencing, linked-read WGS data, and raw SNP array data from the primary ALL patient samples will not be deposited because the patient/parent consent does not cover depositing data that may be used for large-scale determination of germline variants in a repository. The ALL samples were collected 10-20 years ago from pediatric patients aged 2-15 years, some whom have deceased. The linked-read WGS data and the RNA-sequencing data sets generated in the study are available upon reasonable request from the corresponding author Jessica.Nordlund@medsci.uu.se.
Project description:We evaluated linked-read whole genome sequencing (WGS) for detection of structural chromosomal rearrangements in primary samples of varying DNA quality from 12 patients diagnosed with ALL. Linked-read WGS enabled precise, allele-specific, digital karyotyping at a base-pair resolution for a wide range of structural variants including complex rearrangements, aneuploidy assessment and gene deletions. Additional RNA-sequencing and copy number aberrations (CNA) data from Illumina Infinium arrays were also generated and assessed against the linked-read WGS data. RNA-sequencing data was used to support structural chromosomal rearrangements detected in the linked-read WGS data by detecting expressed fusion genes as a consequence of the rearrangements. Illumina Infinium arrays (450k array and/or SNP array) were used to assess CNA status to further support the findings in the linked-read WGS data. The processed CNA data from the primary ALL patient samples has been deposited to GEO. RNA-sequencing, linked-read WGS data, and raw SNP array data from the primary ALL patient samples will not be deposited because the patient/parent consent does not cover depositing data that may be used for large-scale determination of germline variants in a repository. The ALL samples were collected 10-20 years ago from pediatric patients aged 2-15 years, some whom have deceased. The linked-read WGS data and the RNA-sequencing data sets generated in the study are available upon reasonable request from the corresponding author Jessica.Nordlund@medsci.uu.se.
Project description:Constructing high-quality haplotype-resolved genome assemblies has substantially improved the ability to detect and characterize genetic variants. A targeted approach providing readily access to the rich information from haplotype-resolved genome assemblies will be appealing to groups of basic researchers and medical scientists focused on specific genomic regions. Here, using the 4.5 megabase, notoriously difficult-to-assemble major histocompatibility complex (MHC) region as an example, we demonstrated an approach to construct haplotype-resolved assembly of the targeted genomic region with the CRISPR-based enrichment. Compared to the results from haplotype-resolved genome assembly, our targeted approach achieved comparable completeness and accuracy with reduced computing complexity, sequencing cost, as well as the amount of starting materials. Moreover, using the targeted assembled personal MHC haplotypes as the reference both improves the quantification accuracy for sequencing data and enables allele-specific functional genomics analyses of the MHC region. Given its highly efficient use of resources, our approach can greatly facilitate population genetic studies of targeted regions, and may pave a new way to elucidate the molecular mechanisms in disease etiology.
Project description:Constructing high-quality haplotype-resolved genome assemblies has substantially improved the ability to detect and characterize genetic variants. A targeted approach providing readily access to the rich information from haplotype-resolved genome assemblies will be appealing to groups of basic researchers and medical scientists focused on specific genomic regions. Here, using the 4.5 megabase, notoriously difficult-to-assemble major histocompatibility complex (MHC) region as an example, we demonstrated an approach to construct haplotype-resolved assembly of the targeted genomic region with the CRISPR-based enrichment. Compared to the results from haplotype-resolved genome assembly, our targeted approach achieved comparable completeness and accuracy with reduced computing complexity, sequencing cost, as well as the amount of starting materials. Moreover, using the targeted assembled personal MHC haplotypes as the reference both improves the quantification accuracy for sequencing data and enables allele-specific functional genomics analyses of the MHC region. Given its highly efficient use of resources, our approach can greatly facilitate population genetic studies of targeted regions, and may pave a new way to elucidate the molecular mechanisms in disease etiology.
Project description:Genomic and epigenomic sequencing of 5 oesphageal adenocarciomas with evidence of chromothripsis. Genomic sequencing includes: Pacbio circular consensus sequencing, Pacbio continuous long read sequencing, 10X linked read and Illumia HiSeq X Ten sequencing. Epigenomic sequencing includes: Hi-C chromosome capture, ATAC-seq, ChIP seq (for H3K27ac, H3K4me3, H3K27me3 and CTCF) and long read RNA sequencing. All data types have the bam files which have not undergone haplotype resolution (demarcated as unresolved) and some data types also have haplotype resolved reads (demarcated as resolved).
Project description:Using long-read nanopore sequencing, we obtained chromosome-wide phased methylomes of the active and inactive X in mouse placenta and neural stem cells (NSCs), overcoming the limitations if short-read bisulfite sequencing in allelic resolution. We also conducted quantitative analysis of methylation properties like symmetry and entropy, providing a more comprehensive view of epigenetic silencing in X chromosome inactivation. We also resolved the allele-specific genetics and epigenetics of structural macrosatellite Dxz4 and other repeats.