Project description:<p><em>Tripterygium wilfordii</em> is a vine used in Traditional Chinese Medicine (TCM) from the family Celastraceae. The active ingredient celastrol is a friedelane-type pentacyclic triterpenoid, with a putative role as an anti-tumor, immunosuppression, and obesity agent. Here we reported a reference genome assembly of <em>T. wilfordii</em> with high-quality annotation by using a hybrid sequencing strategy, which obtained a 340.12 Mb total genome size, a contig N50 reaching 3.09 Mb, 31593 structure genes, and the repeat percentage was 44.31%. Comparative evolutional analyses showed that <em>T. wilfordii</em> diverged from species of Malpighiales about 102.4 million years ago. In addition, we successfully anchored 91.02% sequences into 23 pseudochromosomes using Hi-C technology and the super-scaffold N50 reached 13.03 Mb. Based on integration of genome, transcriptome and metabolite analyses, as well as in vivo and in vitro enzyme assays of the two CYP450 genes, <em>TwCYP712K1</em> and <em>TwCYP712K2</em> the second biosynthesis step of celastrol was investigated and elucidated. Syntenic analysis revealed that <em>TwCYP712K1</em> and <em>TwCYP712K2</em> derived from a common ancestor. These results have provided insights into further investigating pathways for celastrol and valuable information to aid the conservation of resources and helped us reveal the evolution of Celastrales.</p>
Project description:Accurate annotation of transcript isoforms is crucial to understand gene functions, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data remain imprecise. We developed Bookend, a software package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correct modeling of transcript start and end sites is essential for precise transcript assembly. Furthermore, we discovered that utilization of end-labeled reads present in full-length single-cell RNA-seq (scRNA-seq) datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells (mESCs) can produce end-to-end transcript annotations of comparable quality to reference annotations in these model organisms.