Recovery of cable bacteria genomes from metagenomic samples
Ontology highlight
ABSTRACT: Metagenome sequencing of samples suspected of featuring cable bacteria genomic DNA with short (Illumina NovaSeq) and long read (Nanopore R9.4.1) sequencing platforms.
Project description:Transposon insertion site sequencing (TIS) is a powerful method for associating genotype to phenotype. However, all TIS methods described to date use short nucleotide sequence reads which cannot uniquely determine the locations of transposon insertions within repeating genomic sequences where the repeat units are longer than the sequence read length. To overcome this limitation, we have developed a TIS method using Oxford Nanopore sequencing technology that generates and uses long nucleotide sequence reads; we have called this method LoRTIS (Long Read Transposon Insertion-site Sequencing). This experiment data contains sequence files generated using Nanopore and Illumina platforms. Biotin1308.fastq.gz and Biotin2508.fastq.gz are fastq files generated from nanopore technology. Rep1-Tn.fastq.gz and Rep1-Tn.fastq.gz are fastq files generated using Illumina platform. In this study, we have compared the efficiency of two methods in identification of transposon insertion sites.
Project description:Low-GC Actinobacteria are among the most abundant and widespread microbes in freshwaters and have largely resisted all cultivation efforts. Consequently, their phages have remained totally unknown. In this work, we have used deep metagenomic sequencing to assemble eight complete genomes of the first tailed phages that infect freshwater Actinobacteria. Their genomes encode the actinobacterial-specific transcription factor whiB, frequently found in mycobacteriophages and also in phages infecting marine pelagic Actinobacteria. Its presence suggests a common and widespread strategy of modulation of host transcriptional machinery upon infection via this transcriptional switch. We present evidence that some whiB-carrying phages infect the acI lineage of Actinobacteria. At least one of them encodes the ADP-ribosylating component of the widespread bacterial AB toxins family (for example, clostridial toxin). We posit that the presence of this toxin reflects a 'trojan horse' strategy, providing protection at the population level to the abundant host microbes against eukaryotic predators.
Project description:The Pan-Cancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in more than 2,800 cancer whole genomes from the International Cancer Genome Consortium. Building upon previous work which examined cancer coding regions, this project is exploring the nature and consequences of somatic and germline variations in both coding and non-coding regions, with specific emphasis on cis-regulatory sites, non-coding RNAs, and large-scale structural alterations. Read more on the <a href=\"https://dcc.icgc.org/pcawg\" target=\"_blank\">project website</a>.<br>This is a subset featuring RNA-seq transcription profiling data of 27 cancer subtypes in 19 tissues. Some donors have matched normal tissue. As general reference, a subset of normal tissue samples from the GTEx project were included in this experiment.
Project description:RNA viruses cause significant human pathology and are responsible for the majority of emerging zoonoses. Mainstream diagnostic assays are challenged by their intrinsic diversity, leading to false negatives and incomplete characterisation. New sequencing techniques are expanding our ability to agnostically interrogate nucleic acids within diverse sample types, but in the clinical setting are limited by overwhelming host material and ultra-low target frequency. Through selective host RNA depletion and compensatory protocol adjustments for ultra-low RNA inputs, we are able to detect three major blood-borne RNA viruses - HIV, HCV and HEV. We recovered complete genomes and up to 43% of the genome from samples with viral loads of 104 and 103 IU/ml respectively. Additionally, we demonstrated the utility of this method in detecting and characterising members of diverse RNA virus families within a human plasma background, some present at very low levels. By applying this method to a patient sample series, we have simultaneously determined the full genome of both a novel subtype of HCV genotype 6, and a co-infecting human pegivirus. This method builds upon earlier RNA metagenomic techniques and can play an important role in the surveillance and diagnostics of blood-borne viruses.
Project description:The Pan-Cancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in more than 2,800 cancer whole genomes from the International Cancer Genome Consortium. Building upon previous work which examined cancer coding regions, this project is exploring the nature and consequences of somatic and germline variations in both coding and non-coding regions, with specific emphasis on cis-regulatory sites, non-coding RNAs, and large-scale structural alterations. Read more on the <a href=\"https://dcc.icgc.org/pcawg\" target=\"_blank\">project website</a>.<br>This is a subset featuring RNA-seq transcription profiling data of 27 cancer subtypes in 19 tissues. Some donors have matched normal tissue.<br>This is the alternative view of the experiment for Expression Atlas to show gene expression per donor.
Project description:In this study, we compared the two long-read sequencing platforms, namely the single-molecule real-time sequencing by Pacific Biosciences and nanopore sequencing by Oxford Nanopore Technologies, for the analysis of cell-free DNA from plasma. Artificial mixtures of sonicated human and mouse DNA at different sizes were sequenced with the two platforms.
Project description:Tuberous sclerosis complex (TSC) is a relatively common autosomal dominant disorder characterized by multiple dysplastic organ lesions and neuropsychiatric symptoms, caused by loss-of-function mutation of either TSC1 or TSC2. Target-capture full-length double-stranded cDNA sequencing using long-read sequencer Nanopore (Nanopore Long-read Target Sequencing) revealed that the various kinds of the TSC1 and TSC2 full-length transcripts and the novel intron retention transcripts of TSC2 in TSC patient. Our results indicate that the Nanopore Long-read Target Sequencing is useful for the detection of mutations and confers information on the full-length alternative splicing transcripts for the genetic diagnosis.
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.