Project description:In order to polish a long-read genome assembly, short-read illumina data was obtained from Heterodera schachtii cysts (Woensdrecht population from IRS, the Netherlands). Cysts where obtained from infected plant material. Nematodes were cleaned using a sucrose gradient centrifugation step. Thereafter DNA was extracted and used for library preparation and sequencing by Illumina NextSeq500.
Project description:Here we performed a comprehensive genomic and proteomics analysis of P. stutzeri in aerobic and oxygen-limiting conditions. We combined de novo genome assembly relying on 3rd generation long read sequencing technologies to report the first complete P. stutzeri ATCC14405 genome, which added over 110 kb of sequence and contains 126 full length CDS that were only partially covered in the fragmented short read-based genome assembly available for this strain. With this optimal basis for downstream functional genomics, we next carried out state of the art bottom-up and top-down proteomics analyses to report the most detailed study of proteome remodeling in response to oxygen limitation in P. stutzeri. We identified more than 2900 proteins, i.e. greater than 70% of the theoretical proteome, including 160 annotated small proteins. The proteins included well-established enzymes involved in denitrification and metabolic adaptation to oxygen-limiting conditions, as well as uncharacterized proteins. Notably, we identified 16 novel small proteins that had so far been missed in the genome annotation.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:<p>Recently developed methods that utilize partitioning of long genomic DNA fragments, and barcoding of shorter fragments derived from them, have succeeded in retaining long-range information in short sequencing reads. These so-called read cloud approaches represent a powerful, accurate, and cost-effective alternative to single-molecule long-read sequencing. We developed software, GROC-SVs, that takes advantage of read clouds for structural variant detection and assembly. We apply the method to two 10x Genomics data sets, one chromothriptic sarcoma with several spatially separated samples, and one breast cancer cell line, all Illumina-sequenced to high coverage. Comparison to short-fragment data from the same samples, and validation by mate-pair data from a subset of the sarcoma samples, demonstrate substantial improvement in specificity of breakpoint detection compared to short-fragment sequencing, at comparable sensitivity, and vice versa. The embedded long-range information also facilitates sequence assembly of a large fraction of the breakpoints; importantly, consecutive breakpoints that are closer than the average length of the input DNA molecules can be assembled together and their order and arrangement reconstructed, with some events exhibiting remarkable complexity. These features facilitated an analysis of the structural evolution of the sarcoma. In the chromothripsis, rearrangements occurred before copy number amplifications, and using the phylogenetic tree built from point mutation data, we show that single nucleotide variants and structural variants are not correlated. We predict significant future advances in structural variant science using 10x data analyzed with GROC-SVs and other read cloud-specific methods.</p>
Project description:New tools for improved long-read transcript assembly and coalescence with its short-read counterpart are required. Using our short- and long-read measurements from different cell lines with spiked-in standards, we systematically compared key parameters and biases in the read alignment and assembly of transcripts. We report a cDNA synthesis artifact in long-read datasets that impacts the identity and quantitation of assembled transcripts. We developed a computational pipeline to strand long-read cDNA libraries that markedly improves assembly of transcripts from long-reads. Incorporating stranded long-reads in a new hybrid assembly approach, we demonstrate its efficacy for improved characterization of challenging lncRNA transcripts. Our workflow can be applied to a wide range of transcriptomics datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Project description:New tools for improved long-read transcript assembly and coalescence with its short-read counterpart are required. Using our short- and long-read measurements from different cell lines with spiked-in standards, we systematically compared key parameters and biases in the read alignment and assembly of transcripts. We report a cDNA synthesis artifact in long-read datasets that impacts the identity and quantitation of assembled transcripts. We developed a computational pipeline to strand long-read cDNA libraries that markedly improves assembly of transcripts from long-reads. Incorporating stranded long-reads in a new hybrid assembly approach, we demonstrate its efficacy for improved characterization of challenging lncRNA transcripts. Our workflow can be applied to a wide range of transcriptomics datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Project description:Adenovirus is a common human pathogen that relies on host cell processes for transcription and processing of viral RNA and protein production. Although adenoviral promoters, splice junctions, and cleavage and polyadenylation sites have been characterized using low-throughput biochemical techniques or short read cDNA-based sequencing, these technologies do not fully capture the complexity of the adenoviral transcriptome. By combining Illumina short-read and nanopore long-read direct RNA sequencing approaches, we mapped transcription start sites and cleavage and polyadenylation sites across the adenovirus genome. In addition to confirming the known canonical viral early and late RNA cassettes, our analysis of splice junctions within long RNA reads revealed an additional 35 novel viral transcripts. These RNAs include fourteen new splice junctions which lead to expression of canonical open reading frames (ORF), six novel ORF-containing transcripts, and fifteen transcripts encoding for messages that potentially alter protein functions through truncations or fusion of canonical ORFs. In addition, we also detect RNAs that bypass canonical cleavage sites and generate potential chimeric proteins by linking separate gene transcription units. Of these, an evolutionary conserved protein was detected containing the N-terminus of E4orf6 fused to the downstream DBP/E2A ORF. Loss of this novel protein, E4orf6/DBP, was associated with aberrant viral replication center morphology and poor viral spread. Our work highlights how long-read sequencing technologies can reveal further complexity within viral transcriptomes.
Project description:Objectives: To perform long-read transcriptome and proteome profiling of pathogen-stimulated peripheral blood mononuclear cells (PBMCs) from healthy donors. We aim to discover new transcripts and protein isoforms expressed during immune responses to diverse pathogens. Methods: PBMCs were exposed to four microbial stimuli for 24 hours: the TLR4 ligand lipopolysaccharide (LPS), the TLR3 ligand Poly(I:C), heat-inactivated Staphylococcus aureus, Candida albicans, and RPMI medium as negative controls. Long-read sequencing (PacBio) of one donor and secretome proteomics and short-read sequencing of five donors were performed. IsoQuant was used for transcriptome construction, Metamorpheus/FlashLFQ for proteome analysis, and Illumina short-read 3’-end mRNA sequencing for transcript quantification. Results: Long-read transcriptome profiling reveals the expression of novel sequences and isoform switching induced upon pathogen stimulation, including transcripts that are difficult to detect using traditional short-read sequencing. We observe widespread loss of intron retention as a common result of all pathogen stimulations. We highlight novel transcripts of NFKB1 and CASP1 that may indicate novel immunological mechanisms. In general, RNA expression differences did not result in differences in the amounts of secreted proteins. Interindividual differences in the proteome were larger than the differences between stimulated and unstimulated PBMCs. Clustering analysis of secreted proteins revealed a correlation between chemokine (receptor) expression on the RNA and protein levels in C. albicans- and Poly(I:C)-stimulated PBMCs. Conclusion: Isoform aware long-read sequencing of pathogen-stimulated immune cells highlights the potential of these methods to identify novel transcripts, revealing a more complex transcriptome landscape than previously appreciated.