Project description:Cardiovascular diseases (CVDs) account for high morbidity and mortality worldwide. Both, genetic and epigenetic factors are involved in the enumeration of various cardiovascular diseases. In recent years, a vast amount of multi-omics data are accumulated in the field of cardiovascular research, yet the understanding of key mechanistic aspects of CVDs remain uncovered. Hence, a comprehensive online resource tool is required to comprehend previous research findings and to draw novel methodology for understanding disease pathophysiology. Here, we have developed a literature-based database, CardioGenBase, collecting gene-disease association from Pubmed and MEDLINE. The database covers major cardiovascular diseases such as cerebrovascular disease, coronary artery disease (CAD), hypertensive heart disease, inflammatory heart disease, ischemic heart disease and rheumatic heart disease. It contains ~1,500 cardiovascular disease genes from ~2,4000 research articles. For each gene, literature evidence, ontology, pathways, single nucleotide polymorphism, protein-protein interaction network, normal gene expression, protein expressions in various body fluids and tissues are provided. In addition, tools like gene-disease association finder and gene expression finder are made available for the users with figures, tables, maps and venn diagram to fit their needs. To our knowledge, CardioGenBase is the only database to provide gene-disease association for above mentioned major cardiovascular diseases in a single portal. CardioGenBase is a vital online resource to support genome-wide analysis, genetic, epigenetic and pharmacological studies.
Project description:Next-generation sequencing technologies and the availability of an increasing number of mammalian and other genomes allow gene expression studies, particularly RNA sequencing, in many non-model organisms. However, incomplete genome annotation and assignments of genes to functional annotation databases can lead to a substantial loss of information in downstream data analysis. To overcome this, we developed Mammalian Annotation Database tool (MAdb, https://madb.ethz.ch) to conveniently provide homologous gene information for selected mammalian species. The assignment between species is performed in three steps: (i) matching official gene symbols, (ii) using ortholog information contained in Ensembl Compara and (iii) pairwise BLAST comparisons of all transcripts. In addition, we developed a new tool (AnnOverlappeR) for the reliable assignment of the National Center for Biotechnology Information (NCBI) and Ensembl gene IDs. The gene lists translated to gene IDs of well-annotated species such as a human can be used for improved functional annotation with relevant tools based on Gene Ontology and molecular pathway information. We tested the MAdb on a published RNA-seq data set for the pig and showed clearly improved overrepresentation analysis results based on the assigned human homologous gene identifiers. Using the MAdb revealed a similar list of human homologous genes and functional annotation results regardless of whether starting with gene IDs from NCBI or Ensembl. The MAdb database is accessible via a web interface and a Galaxy application.
Project description:The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/
Project description:Liquid biopsy has emerged as a promising non-invasive approach for detecting, monitoring diseases, and predicting their recurrence. However, the effective utilization of liquid biopsy data to identify reliable biomarkers for various cancers and other diseases requires further exploration. Here, we present cfOmics, a web-accessible database (https://cfomics.ncRNAlab.org/) that integrates comprehensive multi-omics liquid biopsy data, including cfDNA, cfRNA based on next-generation sequencing, and proteome, metabolome based on mass-spectrometry data. As the first multi-omics database in the field, cfOmics encompasses a total of 17 distinct data types and 13 specimen variations across 69 disease conditions, with a collection of 11345 samples. Moreover, cfOmics includes reported potential biomarkers for reference. To facilitate effective analysis and visualization of multi-omics data, cfOmics offers powerful functionalities to its users. These functionalities include browsing, profile visualization, the Integrative Genomic Viewer, and correlation analysis, all centered around genes, microbes, or end-motifs. The primary objective of cfOmics is to assist researchers in the field of liquid biopsy by providing comprehensive multi-omics data. This enables them to explore cell-free data and extract profound insights that can significantly impact disease diagnosis, treatment monitoring, and management.
Project description:The overwhelming fraction of proteins whose sequences have been collected in comprehensive databases may never be assessed for function experimentally. Commonly, putative function is assigned based on similarity to experimentally characterized homologs, either on the level of the entire protein or for single evolutionarily conserved domains. The annotation of individual sites provides more detailed insights regarding the correspondence between sequence and function, as well as context for the interpretation of sequence variation and the outcomes of experiments. In general, site annotation has to be extracted from the published literature, and can often be transferred to closely related sequence neighbors. The National Center for Biotechnology Information's Conserved Domain Database (CDD) provides a system for curators to record functional (such as active sites or binding sites for cofactors) or characteristic sites (such as signature motifs), which are conserved across domain families, and for the transfer of that annotation to protein database sequences via high-confidence domain matches. Recently, CDD curators have begun to sort-site annotations into seven categories (active, polypeptide binding, nucleic acid binding, ion binding, chemical binding, post-translational modification and other) and here we present a first comparative analysis of sites obtained via domain model matches, juxtaposed with existing site annotation encountered in high-quality data sets. Site annotation derived from domain annotation has the potential to cover large fractions of protein sequences, and we observe that CDD-based site annotation complements existing site annotation in many cases, which may, in part, originate from CDD's curation practice of collecting sites conserved across diverse taxa and supported by evidence from multiple 3D structures.
Project description:Annotating functional terms with individual domains is essential for understanding the functions of full-length proteins. We describe SDADB, a functional annotation database for structural domains. SDADB provides associations between gene ontology (GO) terms and SCOP domains calculated with an integrated framework. GO annotations are assigned probabilities of being correct, which are estimated with a Bayesian network by taking advantage of structural neighborhood mappings, SCOP-InterPro domain mapping information, position-specific scoring matrices (PSSMs) and sequence homolog features, with the most substantial contribution coming from high-coverage structure-based domain-protein mappings. The domain-protein mappings are computed using large-scale structure alignment. SDADB contains ontological terms with probabilistic scores for more than 214 000 distinct SCOP domains. It also provides additional features include 3D structure alignment visualization, GO hierarchical tree view, search, browse and download options.Database URL: http://sda.denglab.org.
Project description:Gene expression patterns help to measure and characterize the effect of environmental perturbations at the cellular and organism-level. Complicating interpretation is the presence of uncharacterized or "hypothetical" gene functions for a large percentage of genomes. This is particularly evident in Daphnia genomes, which contains many regions coding for "hypothetical proteins" and are significantly divergent from many of the available arthropod model species, but might be ecologically important. In the present study, we developed a gene expression database, the Daphnia stressor database (http://www.daphnia-stressordb.uni-hamburg.de/dsdbstart.php), built from 90 published studies on Daphnia gene expression. Using a comparative genomics approach, we used the database to annotate D. galeata transcripts. The extensive body of literature available for Daphnia species allowed to associate stressors with gene expression patterns. We believe that our stressor based annotation strategy allows for better understanding and interpretation of the functional role of the understudied hypothetical or uncharacterized Daphnia genes, thereby increasing our understanding of Daphnia's genetic and phenotypic variability.
Project description:The SURFACE (SUrface Residues and Functions Annotated, Compared and Evaluated, URL http://cbm.bio.uniroma2.it/surface/) database is a repository of annotated and compared protein surface regions. SURFACE contains the results of a large-scale protein annotation and local structural comparison project. A non-redundant set of protein chains is used to build a database of protein surface patches, defined as putative surface functional sites. Each patch is annotated with sequence and structure-derived information about function or interaction abilities. A new procedure for structure comparison is used to perform an all-versus-all patches comparison. Selection of the results obtained with stringent parameters offers a similarity score that can be used to associate different patches and allows reliable annotation by similarity. Annotation exerted through the comparison of regions of protein surface allows the highlighting of similarities that cannot be recognized by other methods of sequence or structure comparison. A graphic representation of the surface patches, functional annotations and the structural superpositions is available through the web interface.
Project description:PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 ( approximately 70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/.
Project description:BackgroundAutoimmune diseases are heterogeneous pathologies with difficult diagnosis and few therapeutic options. In the last decade, several omics studies have provided significant insights into the molecular mechanisms of these diseases. Nevertheless, data from different cohorts and pathologies are stored independently in public repositories and a unified resource is imperative to assist researchers in this field.ResultsHere, we present Autoimmune Diseases Explorer ( https://adex.genyo.es ), a database that integrates 82 curated transcriptomics and methylation studies covering 5609 samples for some of the most common autoimmune diseases. The database provides, in an easy-to-use environment, advanced data analysis and statistical methods for exploring omics datasets, including meta-analysis, differential expression or pathway analysis.ConclusionsThis is the first omics database focused on autoimmune diseases. This resource incorporates homogeneously processed data to facilitate integrative analyses among studies.