Project description:The skin commensal yeast Malassezia is associated with several skin disorders. To establish a reference resource, we sought to determine the complete genome sequence of Malassezia sympodialis and identify its protein-coding genes. A novel genome annotation workflow combining RNA sequencing, proteomics, and manual curation was developed to determine gene structures with high accuracy.
Project description:Limited functional annotation of the Z. mobilis genome is a current barrier to both basic studies of Z. mobilis and its development as a synthetic-biology chassis. To gain insight, we collected sample-matched multiomics data including RNA-seq, transcription start site sequencing (TSS-seq), termination sequencing (term-seq), ribosome profiling, and label-free shotgun proteomic mass spectrometry across different growth conditions to improve annotation and assign functional sites in the Z. mobilis genome. Proteomics and ribosome profiling informed revisions of protein-coding genes, which included 44 start codon changes and 42 added proteins.
Project description:Intervention type:DRUG. Intervention1:Huaier, Dose form:GRANULES, Route of administration:ORAL, intended dose regimen:20 to 60/day by either bulk or split for 3 months to extended term if necessary. Control intervention1:None.
Primary outcome(s): For mRNA libraries, focus on mRNA studies. Data analysis includes sequencing data processing and basic sequencing data quality control, prediction of new transcripts, differential expression analysis of genes. Gene Ontology (GO) and the KEGG pathway database are used for annotation and enrichment analysis of up-regulated genes and down-regulated genes.
For small RNA libraries, data analysis includes sequencing data process and sequencing data process QC, small RNA distribution across the genome, rRNA, tRNA, alignment with snRNA and snoRNA, construction of known miRNA expression pattern, prediction New miRNA and Study of their secondary structure Based on the expression pattern of miRNA, we perform not only GO / KEGG annotation and enrichment, but also different expression analysis.. Timepoint:RNA sequencing of 240 blood samples of 80 cases and its analysis, scheduled from June 30, 2022..
Project description:We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long-reads and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from three different tissue types from three other species of squid species (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein coding genes supported by evidence and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome.
Project description:This dataset includes RNAseq data of 7 tissues/developmental stages of Lathyrus sativus genotype LSWT11 and 2 tissues with drought- and well-watered treatments of Lathyrus sativus genotypes LS007 and Mahateora. These data were used in the functional annotation pipeline of the Rbp1.0 genome assembly of LS007. The multi-tissue transcriptome was also used to support gene candidate identification by mRNA abundance. Also included is Hi-C sequencing data used to scaffold the assembly into pseudochromosomes
Project description:The incomplete genome annotation of non-model organisms hampers molecular and proteomic studies. Proteomics informed by transcriptomics (PIT) is suited to non-model organisms because peptides are identified using transcriptomic, not genomic, data. Aedes aegypti is the mosquito vector for the (re-)emerging dengue, chikungunya, yellow fever and Zika viruses. An Ae. aegypti genome sequence is available, however experimental evidence for >90% of the Ae. aegypti proteome or the activity of transposable elements (TEs) that constitute 50% of the Ae. aegypti genome is lacking. We used PIT to characterise the proteome of the Aedes aegypti derived cell line Aag2. Hotspots of incomplete genome annotation were identified which are not explained by poor sequence and assembly quality. We developed criteria for the characterisation of proteomically active TEs and demonstrate that protein expression does not correlate with a TE’s genomic abundance. Finally, we identify Phasi Charoen-like virus as an unrecognised contaminant of Aag2 cells. We therefore present the first proteomic characterisation of mobile genetic elements, and provide proof-of-principle that PIT can evaluate a genome’s annotation to guide annotation efforts.
Project description:Vongsangnak2008 - Genome-scale metabolic
network of Aspergillus oryzae (iWV1314)
This model is described in the article:
Improved annotation through
genome-scale metabolic modeling of Aspergillus oryzae.
Vongsangnak W, Olsen P, Hansen K,
Krogsgaard S, Nielsen J.
BMC Genomics 2008; 9: 245
Abstract:
BACKGROUND: Since ancient times the filamentous fungus
Aspergillus oryzae has been used in the fermentation industry
for the production of fermented sauces and the production of
industrial enzymes. Recently, the genome sequence of A. oryzae
with 12,074 annotated genes was released but the number of
hypothetical proteins accounted for more than 50% of the
annotated genes. Considering the industrial importance of this
fungus, it is therefore valuable to improve the annotation and
further integrate genomic information with biochemical and
physiological information available for this microorganism and
other related fungi. Here we proposed the gene prediction by
construction of an A. oryzae Expressed Sequence Tag (EST)
library, sequencing and assembly. We enhanced the function
assignment by our developed annotation strategy. The resulting
better annotation was used to reconstruct the metabolic network
leading to a genome scale metabolic model of A. oryzae.
RESULTS: Our assembled EST sequences we identified 1,046 newly
predicted genes in the A. oryzae genome. Furthermore, it was
possible to assign putative protein functions to 398 of the
newly predicted genes. Noteworthy, our annotation strategy
resulted in assignment of new putative functions to 1,469
hypothetical proteins already present in the A. oryzae genome
database. Using the substantially improved annotated genome we
reconstructed the metabolic network of A. oryzae. This network
contains 729 enzymes, 1,314 enzyme-encoding genes, 1,073
metabolites and 1,846 (1,053 unique) biochemical reactions. The
metabolic reactions are compartmentalized into the cytosol, the
mitochondria, the peroxisome and the extracellular space.
Transport steps between the compartments and the extracellular
space represent 281 reactions, of which 161 are unique. The
metabolic model was validated and shown to correctly describe
the phenotypic behavior of A. oryzae grown on different carbon
sources. CONCLUSION: A much enhanced annotation of the A.
oryzae genome was performed and a genome-scale metabolic model
of A. oryzae was reconstructed. The model accurately predicted
the growth and biomass yield on different carbon sources. The
model serves as an important resource for gaining further
insight into our understanding of A. oryzae physiology.
This model is hosted on
BioModels Database
and identified by:
MODEL1507180056.
To cite BioModels Database, please use:
BioModels Database:
An enhanced, curated and annotated resource for published
quantitative kinetic models.
To the extent possible under law, all copyright and related or
neighbouring rights to this encoded model have been dedicated to
the public domain worldwide. Please refer to
CC0
Public Domain Dedication for more information.
Project description:Macaque species share over 93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g.,HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort. To close this gap and enhance functional genomics approaches, we employed a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome-level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells (iPSCs) derived from the same animal. Reconstruction of the evolutionary tree using whole genome annotation and orthologous comparisons among three macaque species, human and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques. These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.
Project description:<p><em>Tripterygium wilfordii</em> is a vine used in Traditional Chinese Medicine (TCM) from the family Celastraceae. The active ingredient celastrol is a friedelane-type pentacyclic triterpenoid, with a putative role as an anti-tumor, immunosuppression, and obesity agent. Here we reported a reference genome assembly of <em>T. wilfordii</em> with high-quality annotation by using a hybrid sequencing strategy, which obtained a 340.12 Mb total genome size, a contig N50 reaching 3.09 Mb, 31593 structure genes, and the repeat percentage was 44.31%. Comparative evolutional analyses showed that <em>T. wilfordii</em> diverged from species of Malpighiales about 102.4 million years ago. In addition, we successfully anchored 91.02% sequences into 23 pseudochromosomes using Hi-C technology and the super-scaffold N50 reached 13.03 Mb. Based on integration of genome, transcriptome and metabolite analyses, as well as in vivo and in vitro enzyme assays of the two CYP450 genes, <em>TwCYP712K1</em> and <em>TwCYP712K2</em> the second biosynthesis step of celastrol was investigated and elucidated. Syntenic analysis revealed that <em>TwCYP712K1</em> and <em>TwCYP712K2</em> derived from a common ancestor. These results have provided insights into further investigating pathways for celastrol and valuable information to aid the conservation of resources and helped us reveal the evolution of Celastrales.</p>
Project description:The complete assembly of vast and complex plant genomes, like the hexaploid wheat genome, remains challenging. Here, we present CS-IAAS, a comprehensive telomere-to-telomere (T2T) gap-free Triticum aestivum L. reference genome, encompassing 14.51 billion base pairs and featuring all 21 centromeres and 42 telomeres. Annotation revealed 90.8 Mb additional centromeric satellite arrays and 5,611 ribosomal DNA(rDNA) units. Genome-wide rearrangements, centromeric elements, TE expansion, and segmental duplications were deciphered during tetraploidization and hexaploidization, providing a comprehensive understanding of wheat subgenome evolution. Among them, TE insertions during hexaploidization greatly influenced gene expression balances, thus increasing the genome plasticity of transcriptional levels. Additionally, we generated 163,329 full-length cDNA sequences and proteomic data that helped annotate 141,035 high-confidence (HC) protein-coding genes. However, in such a hexaploidy genome, 20.05%, 33.43%, and 42.76% of gene transcript levels, alternative splicing events, and protein levels were detected unbalancing among subgenomes. The complete T2T reference genome (CS-IAAS), along with its transcriptome and proteome, represents a significant step in our understanding of wheat genome complexity, and provides insights for future wheat research and breeding.