Analysis of differential gene expression and alternative splicing is significantly influenced by choice of mapping genome
Ontology highlight
ABSTRACT: Purpose: To demonstrate that gene expression and splicing analysis varies considerably depending on the mapping reference genome. Methods: We mapped and analyzed submitted RNA reads using different tools and reference genomes to evaluate the influence of genome on DEG and alternative splicing tools. Results: We observed that these differences in transcriptome analysis are, in part, due to the presence of single nucleotide polymorphisms between the sequenced individual and each respective reference genome, as well as annotation differences between the reference genomes that exist even between syntenic orthologs. Conclusion: We conclude that even between two closely related genomes of similar quality, using the reference genome that is most closely related to the species being sampled significantly improves transcriptome.
Project description:Background The Lycophyta species are the extant taxa most similar to early vascular plants that were once abundant on Earth. However, their distribution has greatly diminished. So far, the absence of chromosome level assembled lycophyte genomes, has hindered our understanding of evolution and environmental adaption of lycophytes. Findings We present the reference genome of the tetraploid aquatic quillwort, Isoetes sinensis, a lycophyte. This genome represents the first chromosome-level assembled genome of a tetraploid seed-free plant. Comparison of genomes between I. sinensis and the I. taiwanensis revealed conserved and different genomic features between diploid and polyploid lycophytes. Comparison of the I. sinensis genome with those of other species representing the evolutionary lineages of green plants revealed the inherited genetic tools for transcriptional regulation and most phytohormones in I. sinensis. The presence and absence of key genes related to development and stress responses provides insights into environmental adaption of lycophytes. Conclusions The high-quality reference genome and genomic analysis presented in this study are crucial for future genetic research and the environmental studies of not only I. sinensis but also other lycophytes.
Project description:Background The Lycophyta species are the extant taxa most similar to early vascular plants that were once abundant on Earth. However, their distribution has greatly diminished. So far, the absence of chromosome level assembled lycophyte genomes, has hindered our understanding of evolution and environmental adaption of lycophytes. Findings We present the reference genome of the tetraploid aquatic quillwort, Isoetes sinensis, a lycophyte. This genome represents the first chromosome-level assembled genome of a tetraploid seed-free plant. Comparison of genomes between I. sinensis and the diploid I. taiwanensis revealed of genomic features and polyploid of lycophytes. Comparison of the I. sinensis genome with those of other species representing the evolutionary lineages of green plants revealed the inherited genetic tools for transcriptional regulation and most phytohormones in I. sinensis. The presence and absence of key genes related to development and stress responses provides insights into environmental adaption of lycophytes. Conclusions The high-quality reference genome and genomic analysis presented in this study are crucial for future genetic research and the conservation of not only I. sinensis but also other lycophytes.
Project description:Background The Lycophyta species are the extant taxa most similar to early vascular plants that were once abundant on Earth. However, their distribution has greatly diminished. So far, the absence of chromosome level assembled lycophyte genomes, has hindered our understanding of evolution and environmental adaption of lycophytes. Findings We present the reference genome of the tetraploid aquatic quillwort, Isoetes sinensis, a lycophyte. This genome represents the first chromosome-level assembled genome of a tetraploid seed-free plant. Comparison of genomes between I. sinensis and the diploid I. taiwanensis revealed of genomic features and polyploid of lycophytes. Comparison of the I. sinensis genome with those of other species representing the evolutionary lineages of green plants revealed the inherited genetic tools for transcriptional regulation and most phytohormones in I. sinensis. The presence and absence of key genes related to development and stress responses provides insights into environmental adaption of lycophytes. Conclusions The high-quality reference genome and genomic analysis presented in this study are crucial for future genetic research and the conservation of not only I. sinensis but also other lycophytes.
Project description:Background The Lycophyta species are the extant taxa most similar to early vascular plants that were once abundant on Earth. However, their distribution has greatly diminished. So far, the absence of chromosome level assembled lycophyte genomes, has hindered our understanding of evolution and environmental adaption of lycophytes. Findings We present the reference genome of the tetraploid aquatic quillwort, Isoetes sinensis, a lycophyte. This genome represents the first chromosome-level assembled genome of a tetraploid seed-free plant. Comparison of genomes between I. sinensis and the diploid I. taiwanensis revealed of genomic features and polyploid of lycophytes. Comparison of the I. sinensis genome with those of other species representing the evolutionary lineages of green plants revealed the inherited genetic tools for transcriptional regulation and most phytohormones in I. sinensis. The presence and absence of key genes related to development and stress responses provides insights into environmental adaption of lycophytes. Conclusions The high-quality reference genome and genomic analysis presented in this study are crucial for future genetic research and the conservation of not only I. sinensis but also other lycophytes.
Project description:Background The Lycophyta species are the extant taxa most similar to early vascular plants that were once abundant on Earth. However, their distribution has greatly diminished. So far, the absence of chromosome level assembled lycophyte genomes, has hindered our understanding of evolution and environmental adaption of lycophytes. Findings We present the reference genome of the tetraploid aquatic quillwort, Isoetes sinensis, a lycophyte. This genome represents the first chromosome-level assembled genome of a tetraploid seed-free plant. Comparison of genomes between I. sinensis and the diploid I. taiwanensis revealed of genomic features and polyploid of lycophytes. Comparison of the I. sinensis genome with those of other species representing the evolutionary lineages of green plants revealed the inherited genetic tools for transcriptional regulation and most phytohormones in I. sinensis. The presence and absence of key genes related to development and stress responses provides insights into environmental adaption of lycophytes. Conclusions The high-quality reference genome and genomic analysis presented in this study are crucial for future genetic research and the conservation of not only I. sinensis but also other lycophytes.
Project description:The Global Pandemic Lineage (GPL) of the amphibian pathogen Batrachochytrium dendrobatidis (Bd) has been described as a main driver of amphibian extinctions on nearly every continent. Near complete genome of three Bd-GPL strains have enabled studies of the pathogen but the genomic features that set Bd-GPL apart from other Bd lineages is not well understood due to a lack of high-quality genome assemblies and annotations from other lineages. We used long-read DNA sequencing to assemble high-quality genomes of three Bd-BRAZIL isolates and one non-pathogen outgroup species Polyrhizophydium stewartii (Ps) strain JEL0888, and compared these to genomes of previously sequenced Bd-GPL strains. The Bd-BRAZIL assemblies range in size between 22.0 and 26.1 Mb and encode 8495-8620 protein-coding genes for each strain. Our pan-genome analysis provided insight into shared and lineage-specific gene content. The core genome of Bd consists of 6278 conserved gene families, with 202 Bd-BRAZIL and 172 Bd-GPL specific gene families. We discovered gene copy number variation in pathogenicity gene families between Bd-BRAZIL and Bd-GPL strains though none were consistently expanded in Bd-GPL or Bd-BRAZIL strains. Comparison within the Batrachochytrium genus and two closely related non-pathogenic saprophytic chytrids identified variation in sequence and protein domain counts. We further test these new Bd-BRAZIL genomes to assess their utility as reference genomes for transcriptome alignment and analysis. Our analysis examines the genomic variation between strains in Bd-BRAZIL and Bd-GPL and offers insights into the application of these genomes as reference genomes for future studies.
Project description:The complete assembly of vast and complex plant genomes, like the hexaploid wheat genome, remains challenging. Here, we present CS-IAAS, a comprehensive telomere-to-telomere (T2T) gap-free Triticum aestivum L. reference genome, encompassing 14.51 billion base pairs and featuring all 21 centromeres and 42 telomeres. Annotation revealed 90.8 Mb additional centromeric satellite arrays and 5,611 ribosomal DNA(rDNA) units. Genome-wide rearrangements, centromeric elements, TE expansion, and segmental duplications were deciphered during tetraploidization and hexaploidization, providing a comprehensive understanding of wheat subgenome evolution. Among them, TE insertions during hexaploidization greatly influenced gene expression balances, thus increasing the genome plasticity of transcriptional levels. Additionally, we generated 163,329 full-length cDNA sequences and proteomic data that helped annotate 141,035 high-confidence (HC) protein-coding genes. However, in such a hexaploidy genome, 20.05%, 33.43%, and 42.76% of gene transcript levels, alternative splicing events, and protein levels were detected unbalancing among subgenomes. The complete T2T reference genome (CS-IAAS), along with its transcriptome and proteome, represents a significant step in our understanding of wheat genome complexity, and provides insights for future wheat research and breeding.
Project description:Gene expression was quanitified in 4 naive corneas from BALB/c and 4 corneas from C57BL/6N mice without intervention by RNAseq of total RNA with the Ovation Kit for model organisms. To avoid false positive differential expression from better alignment of the reads from C57BL/6 mice to the reference representing a closely related strain while retaining the applicability of the standard reference genome annotation, two pseudogenomes were generated incorporating the known variants into the reference and aligning to the resulting genomes. BAM files were then converted with Lapels to the standard reference, which includes conversion of genome coordinates and adjusting CIGAR strings. Then expression quantification is possible with respect to the standard gene model (here Ensembl version 94) again.
Project description:Background: Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes. Results: Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus siliculosus, a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data. Conclusions: Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu.