Project description:Enhancer RNAs (eRNAs) are a class of non-coding RNAs transcribed from enhancers. As the markers of active enhancers, eRNAs play important roles in gene regulation and are associated with various complex traits and characteristics. With increasing attention to eRNAs, numerous eRNAs have been identified in different human tissues. However, the expression landscape, regulatory network and potential functions of eRNAs in animals have not been fully elucidated. Here, we systematically characterized 185 177 eRNAs from 5085 samples across 10 species by mapping the RNA sequencing data to the regions of known enhancers. To explore their potential functions based on evolutionary conservation, we investigated the sequence similarity of eRNAs among multiple species. In addition, we identified the possible associations between eRNAs and transcription factors (TFs) or nearby genes to decipher their possible regulators and target genes, as well as characterized trait-related eRNAs to explore their potential functions in biological processes. Based on these findings, we further developed Animal-eRNAdb (http://gong_lab.hzau.edu.cn/Animal-eRNAdb/), a user-friendly database for data searching, browsing and downloading. With the comprehensive characterization of eRNAs in various tissues of different species, Animal-eRNAdb may greatly facilitate the exploration of functions and mechanisms of eRNAs.
Project description:The Plant snoRNA database (http://www.scri.sari.ac.uk/plant_snoRNA/) provides information on small nucleolar RNAs from Arabidopsis and eighteen other plant species. Information includes sequences, expression data, methylation and pseudouridylation target modification sites, initial gene organization (polycistronic, single gene and intronic) and the number of gene variants. The Arabidopsis information is divided into box C/D and box H/ACA snoRNAs, and within each of these groups, by target sites in rRNA, snRNA or unknown. Alignments of orthologous genes and gene variants from different plant species are available for many snoRNA genes. Plant snoRNA genes have been given a standard nomenclature, designed wherever possible, to provide a consistent identity with yeast and human orthologues.
Project description:Retrocopies of protein-coding genes, reverse transcribed and inserted into the genome copies of mature RNA, have commonly been categorized as pseudogenes with no biological importance. However, recent studies showed that they play important role in the genomes evolution and shaping interspecies differences. Here, we present RetrogeneDB, a database of retrocopies in 62 animal genomes. RetrogeneDB contains information about retrocopies, their genomic localization, parental genes, ORF conservation, and expression. To our best knowledge, this is the most complete retrocopies database providing information for dozens of species previously never analyzed in the context of protein-coding genes retroposition. The database is available at http://retrogenedb.amu.edu.pl.
Project description:MicroRNAs (miRNA) are approximately 21 nucleotide-long non-coding small RNAs, which function as post-transcriptional regulators in eukaryotes. miRNAs play essential roles in regulating plant growth and development. In recent years, research into the mechanism and consequences of miRNA action has made great progress. With whole genome sequence available in such plants as Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Glycine max, etc., it is desirable to develop a plant miRNA database through the integration of large amounts of information about publicly deposited miRNA data. The plant miRNA database (PMRD) integrates available plant miRNA data deposited in public databases, gleaned from the recent literature, and data generated in-house. This database contains sequence information, secondary structure, target genes, expression profiles and a genome browser. In total, there are 8433 miRNAs collected from 121 plant species in PMRD, including model plants and major crops such as Arabidopsis, rice, wheat, soybean, maize, sorghum, barley, etc. For Arabidopsis, rice, poplar, soybean, cotton, medicago and maize, we included the possible target genes for each miRNA with a predicted interaction site in the database. Furthermore, we provided miRNA expression profiles in the PMRD, including our local rice oxidative stress related microarray data (LC Sciences miRPlants_10.1) and the recently published microarray data for poplar, Arabidopsis, tomato, maize and rice. The PMRD database was constructed by open source technology utilizing a user-friendly web interface, and multiple search tools. The PMRD is freely available at http://bioinformatics.cau.edu.cn/PMRD. We expect PMRD to be a useful tool for scientists in the miRNA field in order to study the function of miRNAs and their target genes, especially in model plants and major crops.
Project description:Two transcribed retrocopies of the fibroblast growth factor 4 (FGF4) gene have previously been described in the domestic dog. An FGF4 retrocopy on chr18 is associated with disproportionate dwarfism, while an FGF4 retrocopy on chr12 is associated with both disproportionate dwarfism and intervertebral disc disease (IVDD). In this study, whole-genome sequencing data were queried to identify other FGF4 retrocopies that could be contributing to phenotypic diversity in canids. Additionally, dogs with surgically confirmed IVDD were assayed for novel FGF4 retrocopies. Five additional and distinct FGF4 retrocopies were identified in canids including a copy unique to red wolves (Canis rufus). The FGF4 retrocopies identified in domestic dogs were identical to domestic dog FGF4 haplotypes, which are distinct from modern wolf FGF4 haplotypes, indicating that these retrotransposition events likely occurred after domestication. The identification of multiple, full length FGF4 retrocopies with open reading frames in canids indicates that gene retrotransposition events occur much more frequently than previously thought and provide a mechanism for continued genetic and phenotypic diversity in canids.
Project description:LINE-1 is an active transposable element encoding proteins capable of inserting host gene retrocopies, resulting in retro-copy number variants (retroCNVs) between individuals. Here, we performed retroCNV discovery using 86 equids and identified 437 retrocopy insertions. Only 5 retroCNVs were shared between horses and other equids, indicating that the majority of retroCNVs inserted after the species diverged. A large number (17–35 copies) of segmentally duplicated Ligand Dependent Nuclear Receptor Corepressor Like (LCORL) retrocopies were present in all equids but absent from other extant perissodactyls. The majority of LCORL transcripts in horses and donkeys originate from the retrocopies. The initial LCORL retrotransposition occurred 18 million years ago (17–19 95% CI), which is coincident with the increase in body size, reduction in digit number, and changes in dentition that characterized equid evolution. Evolutionary conservation of the LCORL retrocopy segmental amplification in the Equidae family, high expression levels and the ancient timeline for LCORL retrotransposition support a functional role for this structural variant.
Project description:BACKGROUND: Sireviruses are an ancient genus of the Copia superfamily of LTR retrotransposons, and the only one that has exclusively proliferated within plant genomes. Based on experimental data and phylogenetic analyses, Sireviruses have successfully infiltrated many branches of the plant kingdom, extensively colonizing the genomes of grass species. Notably, it was recently shown that they have been a major force in the make-up and evolution of the maize genome, where they currently occupy ~21% of the nuclear content and ~90% of the Copia population. It is highly likely, therefore, that their life dynamics have been fundamental in the genome composition and organization of a plethora of plant hosts. To assist studies into their impact on plant genome evolution and also facilitate accurate identification and annotation of transposable elements in sequencing projects, we developed MASiVEdb (Mapping and Analysis of SireVirus Elements Database), a collective and systematic resource of Sireviruses in plants. DESCRIPTION: Taking advantage of the increasing availability of plant genomic sequences, and using an updated version of MASiVE, an algorithm specifically designed to identify Sireviruses based on their highly conserved genome structure, we populated MASiVEdb (http://bat.infspire.org/databases/masivedb/) with data on 16,243 intact Sireviruses (total length >158Mb) discovered in 11 fully-sequenced plant genomes. MASiVEdb is unlike any other transposable element database, providing a multitude of highly curated and detailed information on a specific genus across its hosts, such as complete set of coordinates, insertion age, and an analytical breakdown of the structure and gene complement of each element. All data are readily available through basic and advanced query interfaces, batch retrieval, and downloadable files. A purpose-built system is also offered for detecting and visualizing similarity between user sequences and Sireviruses, as well as for coding domain discovery and phylogenetic analysis. CONCLUSION: MASiVEdb is currently the most comprehensive directory of Sireviruses, and as such complements other efforts in cataloguing plant transposable elements and elucidating their role in host genome evolution. Such insights will gradually deepen, as we plan to further improve MASiVEdb by phylogenetically mapping Sireviruses into families, by including data on fragments and solo LTRs, and by incorporating elements from newly-released genomes.
Project description:Transfer RNAs (tRNAs) are intermediate-sized non-coding RNAs found in all organisms that help translate messenger RNA into protein. Recently, the number of sequenced plant genomes has increased dramatically. The availability of this extensive data greatly accelerates the study of tRNAs on a large scale. Here, 8,768,261 scaffolds/chromosomes containing 229,093 giga-base pairs representing whole-genome sequences of 256 plant species were analyzed to identify tRNA genes. As a result, 331,242 nuclear, 3,216 chloroplast, and 1,467 mitochondrial tRNA genes were identified. The nuclear tRNA genes include 275,134 tRNAs decoding 20 standard amino acids, 1,325 suppressor tRNAs, 6,273 tRNAs with unknown isotypes, 48,475 predicted pseudogenes, and 37,873 tRNAs with introns. Efforts also extended to the creation of PltRNAdb (https://bioinformatics.um6p.ma/PltRNAdb/index.php), a data source for tRNA genes from 256 plant species. PltRNAdb website allows researchers to search, browse, visualize, BLAST, and download predicted tRNA genes. PltRNAdb will help improve our understanding of plant tRNAs and open the door to discovering the unknown regulatory roles of tRNAs in plant genomes.
Project description:Pre-clinical research builds on a large variety of in vivo and ex vivo tools such as non-invasive imaging, microscopy, and analysis of gene expression. To work efficiently with multimodal data and correlate results across scales, it is of particular importance to have easy access to all data points from different specimen, e.g. the magnetic resonance imaging (MRI) data from different time points, and the post-mortem histology. That requires an efficient data management, which is customizable and designed to relate all applied methods, raw data and analyses to one specific animal. Despite increasing demands to handle such complex data, most pre-clinical labs have not yet established such an electronic database. Here, we present a novel cloud-based relational database for multimodal animal data, which operates on commercial software. We have implemented data fields for various pre-clinical features such as MRI, histology and behaviour. Automated procedures replace manual and recurrent calculations. Pre-set plotting and printing features provide efficient analysis and documentation. The database template is useful for all labs working with laboratory animals and the adaption to specific research projects requires no prior scripting expertise. The database works operating-system independent through the web browser and allows multiple users to work simultaneously. The data entry is monitored and restricted for particular tests according to the user management in order to keep for example users during the experiment blinded for the experimental group. The database improves data accessibility, standardization of data recording and data handling efficiency in pre-clinical research.
Project description:BackgroundLong noncoding RNAs (lncRNAs) have attracted significant attention in recent years due to their important roles in many biological processes. Domestic animals constitute a unique resource for understanding the genetic basis of phenotypic variation and are ideal models relevant to diverse areas of biomedical research. With improving sequencing technologies, numerous domestic-animal lncRNAs are now available. Thus, there is an immediate need for a database resource that can assist researchers to store, organize, analyze and visualize domestic-animal lncRNAs.ResultsThe domestic-animal lncRNA database, named ALDB, is the first comprehensive database with a focus on the domestic-animal lncRNAs. It currently archives 12,103 pig intergenic lncRNAs (lincRNAs), 8,923 chicken lincRNAs and 8,250 cow lincRNAs. In addition to the annotations of lincRNAs, it offers related data that is not available yet in existing lncRNA databases (lncRNAdb and NONCODE), such as genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. Moreover, a collection of interfaces and applications, such as the Basic Local Alignment Search Tool (BLAST), the Generic Genome Browser (GBrowse) and flexible search functionalities, are available to help users effectively explore, analyze and download data related to domestic-animal lncRNAs.ConclusionsALDB enables the exploration and comparative analysis of lncRNAs in domestic animals. A user-friendly web interface, integrated information and tools make it valuable to researchers in their studies. ALDB is freely available from http://res.xaut.edu.cn/aldb/index.jsp.