Project description:Accurate annotation of transcript isoforms is crucial to understand gene functions, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data remain imprecise. We developed Bookend, a software package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correct modeling of transcript start and end sites is essential for precise transcript assembly. Furthermore, we discovered that utilization of end-labeled reads present in full-length single-cell RNA-seq (scRNA-seq) datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells (mESCs) can produce end-to-end transcript annotations of comparable quality to reference annotations in these model organisms.
Project description:We developed Bookend, a package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5' and 3' ends. We demonstrate that correct identification of transcript start and end sites is essential for precise full-length transcript assembly. Utilization of end-labeled reads present in full-length single-cell RNA-seq datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis thaliana, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells, can produce reference-quality end-to-end transcript annotations.
Project description:Primary hypothesis: Side-to-end anastomosis is non-inferior to colon J pouch for reconstruction after low anterior resection for rectal cancer in fecal incontinence (Wexner score).
Research questions: Are there differences between side-to-end anastomosis and colon J pouch in
* bowel function (fecal incontinence, frequency of bowel movements, rectal urgency, incomplete evacuation)
* quality of life
* sexual function
* urinary function
* postoperative complications
* operation time/ institutional costs
Project description:This study aims to investigate the efficacy of Guided Written Disclosure Protocol (GWDP) in promoting post-traumatic growth through a process of meaning reconstruction in cancer patients at the end of chemotherapy. Also, the intervention (GWDP) intends to reduce distress symptoms (i.e. intrusive thoughts, avoidance, depression and anxiety).
Project description:To assess the performance of computational methods for exon identification, transcript reconstruction and expression level quantification from RNA-seq data, 24 protocol variants of 14 independent software programs (AUGUSTUS, Cufflinks, Exonerate, GSTRUCT, iReckon, mGene, mTim, NextGeneid, Oases, SLIDE, Transomics, Trembly, Tromer and Velvet) were evaluated against transcriptome data from human cells and two model organisms. The following supplementary data files are also available from ArrayExpress at http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1730/files/ : (1) Reference annotation used in the study, (2) Alignments used for reference annotation, (3) Predicted exons and transcript models submitted for evaluation and (4) NanoString nCounter probes and detection counts. Each file is described in greater detail in file http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1730/files/E-MTAB-1730_supplemental_files.xlsx .
Project description:The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second, North American, short-read genome assembly (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterization. S. vulgaris vAU combined 10× genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 6222 scaffolds (7.6 Mb scaffold N50, 94.6% busco completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Species-specific transcript mapping and gene annotation revealed good gene-level assembly and high functional completeness. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, saaga) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counterintuitive behaviour in traditional busco metrics, and present buscomp, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. This work expands our knowledge of avian genomes and the available toolkit for assessing and improving genome quality. The new genomic resources presented will facilitate further global genomic and transcriptomic analysis on this ecologically important species.
Project description:The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recovered the vast majority of previously annotated transcripts and doubled the total number of transcripts cataloged. We found that 20% of protein coding genes encode multiple protein-localization signals and that, in 20-d-old adult fly heads, genes with multiple polyadenylation sites are more common than genes with alternative splicing or alternative promoters. GRIT demonstrates 30% higher precision and recall than the most widely used transcript assembly tools. GRIT will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.
Project description:We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression.Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.
Project description:PIWI-clade Argonaute proteins repress transposable elements in animal gonads. Their sequence specificity is conferred via bound ~23-30nt long piRNAs, which are processed from single stranded precursor RNAs. How transcripts are specified as precursors and processed into stereotypical piRNA populations are central unresolved questions. Here we show that piRNA-guided RNA cleavage in Drosophila results not only in generation of a ping-pong partner piRNA but further triggers efficient 3′ directed and phased primary piRNA biogenesis. Phasing is a feature of primary piRNAs in somatic and germline cells and a consequence of consecutive endo-nucleolytic cleavage events catalyzed by Zucchini. Formation of 3′ and 5′ ends of flanking piRNAs is therefore tightly coupled. Zucchini also participates in 3′ end formation of secondary piRNAs but its function can be bypassed by additional downstream piRNA-guided cleavages and subsequent precursor trimming. Hallmarks of Zucchini-dependent phased piRNA biogenesis are also evident in mouse testes, pointing to an evolutionarily conserved mechanism of piRNA biogenesis. This study aims at understanding how piRNA biogenesis is intiated in the Drosophila germline and understanding the role of the nuclease Zucchini/MitoPLD in piRNA biogenesis in Drosophila/Mouse by analysing small RNA sequencing data of various genotypes and sensor constructs.