Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.
Project description:Transposon insertion site sequencing (TIS) is a powerful method for associating genotype to phenotype. However, all TIS methods described to date use short nucleotide sequence reads which cannot uniquely determine the locations of transposon insertions within repeating genomic sequences where the repeat units are longer than the sequence read length. To overcome this limitation, we have developed a TIS method using Oxford Nanopore sequencing technology that generates and uses long nucleotide sequence reads; we have called this method LoRTIS (Long Read Transposon Insertion-site Sequencing). This experiment data contains sequence files generated using Nanopore and Illumina platforms. Biotin1308.fastq.gz and Biotin2508.fastq.gz are fastq files generated from nanopore technology. Rep1-Tn.fastq.gz and Rep1-Tn.fastq.gz are fastq files generated using Illumina platform. In this study, we have compared the efficiency of two methods in identification of transposon insertion sites.
Project description:We explored changes at gene-level or transcript-level in embryonic stem cells, before and after in vitro differentiation with retinoic acid. RNA was sequenced both via Illumina short reads, and with Oxford Nanopore Technology with cDNA and direct RNA sequencing.
Project description:We explored changes at gene-level or transcript-level in embryonic stem cells, before and after in vitro differentiation with retinoic acid. RNA was sequenced both via Illumina short reads, and with Oxford Nanopore Technology with cDNA and direct RNA sequencing.
Project description:Nitrate-reducing iron(II)-oxidizing bacteria are widespread in the environment contribute to nitrate removal and influence the fate of the greenhouse gases nitrous oxide and carbon dioxide. The autotrophic growth of nitrate-reducing iron(II)-oxidizing bacteria is rarely investigated and poorly understood. The most prominent model system for this type of studies is enrichment culture KS, which originates from a freshwater sediment in Bremen, Germany. To gain insights in the metabolism of nitrate reduction coupled to iron(II) oxidation under in the absence of organic carbon and oxygen limited conditions, we performed metagenomic, metatranscriptomic and metaproteomic analyses of culture KS. Raw sequencing data of 16S rRNA amplicon sequencing, shotgun metagenomics (short reads: Illumina; long reads: Oxford Nanopore Technologies), metagenome assembly, raw sequencing data of shotgun metatranscriptomes (2 conditions, triplicates) can be found at SRA in https://www.ncbi.nlm.nih.gov/bioproject/PRJNA682552. This dataset contains proteomics data for 2 conditions (heterotrophic and autotrophic growth conditions) in triplicates.
Project description:Adenovirus is a common human pathogen that relies on host cell processes for transcription and processing of viral RNA and protein production. Although adenoviral promoters, splice junctions, and cleavage and polyadenylation sites have been characterized using low-throughput biochemical techniques or short read cDNA-based sequencing, these technologies do not fully capture the complexity of the adenoviral transcriptome. By combining Illumina short-read and nanopore long-read direct RNA sequencing approaches, we mapped transcription start sites and cleavage and polyadenylation sites across the adenovirus genome. In addition to confirming the known canonical viral early and late RNA cassettes, our analysis of splice junctions within long RNA reads revealed an additional 35 novel viral transcripts. These RNAs include fourteen new splice junctions which lead to expression of canonical open reading frames (ORF), six novel ORF-containing transcripts, and fifteen transcripts encoding for messages that potentially alter protein functions through truncations or fusion of canonical ORFs. In addition, we also detect RNAs that bypass canonical cleavage sites and generate potential chimeric proteins by linking separate gene transcription units. Of these, an evolutionary conserved protein was detected containing the N-terminus of E4orf6 fused to the downstream DBP/E2A ORF. Loss of this novel protein, E4orf6/DBP, was associated with aberrant viral replication center morphology and poor viral spread. Our work highlights how long-read sequencing technologies can reveal further complexity within viral transcriptomes.
Project description:Sequencing was performed to assess the ability of Nanopore direct cDNA and native RNA sequencing to characterise human transcriptomes. Total RNA was extracted from either HAP1 or HEK293 cells, and the polyA+ fraction isolated using oligodT dynabeads. Libraries were prepared using Oxford Nanopore Technologies (ONT) kits according to manufacturers instructions. Samples were then sequenced on ONT R9.4 flow cells to generate fast5 raw reads in the ONT MinKNOW software. Fast5 reads were then base-called using the ONT Albacore software to generate Fastq reads.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.