Project description:The northern white rhinoceros (Ceratotherium simum cottoni) genome and annotation were previously published, but the annotation contained few genes, with many annotation misalignments, and nomenclature not matching HGNC/VGNC naming conventions, making transcriptional studies very difficult. We used in vivo collected granulosa cells for RNA sequencing and de novo transcript assembly through StringTie to identify all nucleotide gene sequences in our samples. Through extensive manual curation we were able to generate a greatly improved genome annotation increasing gene numbers by 81%. This will greatly enable researchers in this field to utilize the genome and annotation to complete transcriptional studies with this species.
Project description:Accurate annotation of transcript isoforms is crucial to understand gene functions, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data remain imprecise. We developed Bookend, a software package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correct modeling of transcript start and end sites is essential for precise transcript assembly. Furthermore, we discovered that utilization of end-labeled reads present in full-length single-cell RNA-seq (scRNA-seq) datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells (mESCs) can produce end-to-end transcript annotations of comparable quality to reference annotations in these model organisms.
Project description:<p><em>Tripterygium wilfordii</em> is a vine used in Traditional Chinese Medicine (TCM) from the family Celastraceae. The active ingredient celastrol is a friedelane-type pentacyclic triterpenoid, with a putative role as an anti-tumor, immunosuppression, and obesity agent. Here we reported a reference genome assembly of <em>T. wilfordii</em> with high-quality annotation by using a hybrid sequencing strategy, which obtained a 340.12 Mb total genome size, a contig N50 reaching 3.09 Mb, 31593 structure genes, and the repeat percentage was 44.31%. Comparative evolutional analyses showed that <em>T. wilfordii</em> diverged from species of Malpighiales about 102.4 million years ago. In addition, we successfully anchored 91.02% sequences into 23 pseudochromosomes using Hi-C technology and the super-scaffold N50 reached 13.03 Mb. Based on integration of genome, transcriptome and metabolite analyses, as well as in vivo and in vitro enzyme assays of the two CYP450 genes, <em>TwCYP712K1</em> and <em>TwCYP712K2</em> the second biosynthesis step of celastrol was investigated and elucidated. Syntenic analysis revealed that <em>TwCYP712K1</em> and <em>TwCYP712K2</em> derived from a common ancestor. These results have provided insights into further investigating pathways for celastrol and valuable information to aid the conservation of resources and helped us reveal the evolution of Celastrales.</p>