Project description:Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), that improves the accuracy of gene expression prediction by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models using SUMMIT-FA with a comprehensive functional database MACIE and the eQTL summary-level data from the eQTLGen consortium. By applying the resulting models to GWASs for 24 complex traits and exploring it through a simulation study, we show that SUMMIT-FA improves the accuracy of gene expression prediction models in whole blood, identifies significantly more gene-trait associations, and improves predictive power for identifying "silver standard" genes compared to several benchmark methods.
Project description:BACKGROUND:The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS:Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION:We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
Project description:The roadblock/LC7 dynein light chain is a ubiquitous component of all dyneins and is essential for many diverse processes including proper axonal transport and dendrite growth. In addition, LC7 functions in non-dynein transcriptional activation of the transforming growth factor-beta complex. Crystal structures of Drosophila melanogaster LC7 in the apo form and in complex with a segment of the disordered N-terminal domain of dynein intermediate chain (IC) provide the first definitive identification of the IC sequence recognized by LC7. The site, confirmed by isothermal titration calorimetry studies, overlaps the IC sequence considered in the literature to be an IC self-association domain. The IC peptide binds as two amphipathic helices that lie along an extensive hydrophobic cleft on LC7 and ends with a polar side-chain interaction network that includes conserved residues from both proteins. The LC7 recognition sequence on IC and its interface with LC7 are well conserved and are, thus, likely representative of all IC x LC7 structures. Interestingly, the position of bound IC in the IC x LC7 complex mimics a helix that is integrated into the primary structure in distantly related LC7 homologs. The IC x LC7 structure further shows that the naturally occurring robl(Z) deletion mutation contains the majority of the IC binding site and suggests that promotion of IC binding by phosphorylation of LC7 is an indirect effect.
Project description:The gamma-proteobacterium Shewanella oneidensis strain MR-1 is a metabolically versatile organism that can reduce a wide range of organic compounds, metal ions, and radionuclides. Similar to most other sequenced organisms, approximately 40% of the predicted ORFs in the S. oneidensis genome were annotated as uncharacterized "hypothetical" genes. We implemented an integrative approach by using experimental and computational analyses to provide more detailed insight into gene function. Global expression profiles were determined for cells after UV irradiation and under aerobic and suboxic growth conditions. Transcriptomic and proteomic analyses confidently identified 538 hypothetical genes as expressed in S. oneidensis cells both as mRNAs and proteins (33% of all predicted hypothetical proteins). Publicly available analysis tools and databases and the expression data were applied to improve the annotation of these genes. The annotation results were scored by using a seven-category schema that ranked both confidence and precision of the functional assignment. We were able to identify homologs for nearly all of these hypothetical proteins (97%), but could confidently assign exact biochemical functions for only 16 proteins (category 1; 3%). Altogether, computational and experimental evidence provided functional assignments or insights for 240 more genes (categories 2-5; 45%). These functional annotations advance our understanding of genes involved in vital cellular processes, including energy conversion, ion transport, secondary metabolism, and signal transduction. We propose that this integrative approach offers a valuable means to undertake the enormous challenge of characterizing the rapidly growing number of hypothetical proteins with each newly sequenced genome.
Project description:Microbiology depends on the availability of annotated microbial genomes for many applications. Comparative genomics approaches have been a major advance, but consistent and accurate annotations of genomes can be hard to obtain. In addition, newer concepts such as the pan-genome concept are still being implemented to help answer biological questions. Hence, we present proGenomes2, which provides 87 920 high-quality genomes in a user-friendly and interactive manner. Genome sequences and annotations can be retrieved individually or by taxonomic clade. Every genome in the database has been assigned to a species cluster and most genomes could be accurately assigned to one or multiple habitats. In addition, general functional annotations and specific annotations of antibiotic resistance genes and single nucleotide variants are provided. In short, proGenomes2 provides threefold more genomes, enhanced habitat annotations, updated taxonomic and functional annotation and improved linkage to the NCBI BioSample database. The database is available at http://progenomes.embl.de/.
Project description:Loss-of-function phenotypes are widely used to infer gene function using the principle that similar phenotypes are indicative of similar functions. However, converting phenotypic to functional annotations requires careful interpretation of phenotypic descriptions and assessment of phenotypic similarity. Understanding how functions and phenotypes are linked will be crucial for the development of methods for the automatic conversion of gene loss-of-function phenotypes to gene functional annotations.We explored the relation between cellular phenotypes from RNAi-based screens in human cells and gene annotations of cellular functions as provided by the Gene Ontology (GO). Comparing different similarity measures, we found that information content-based measures of phenotypic similarity were the best at capturing gene functional similarity. However, phenotypic similarities did not map to the Gene Ontology organization of gene function but to functions defined as groups of GO terms with shared gene annotations.Our observations have implications for the use and interpretation of phenotypic similarities as a proxy for gene functions both in RNAi screen data analysis and curation and in the prediction of disease genes.
Project description:Metallothionein-3 (MT-3), a member of the mammalian metallothionein (MT) family, is mainly expressed in the central nervous system (CNS). MT-3 possesses a unique neuronal growth inhibitory activity, and the levels of this intra- and extracellularly occurring metalloprotein are markedly diminished in the brain of patients affected by a number of metal-linked neurodegenerative disorders, including Alzheimer's disease (AD). In these pathologies, the redox cycling of copper, accompanied by the production of reactive oxygen species (ROS), plays a key role in the neuronal toxicity. Although MT-3 shares the metal-thiolate clusters with the well-characterized MT-1 and MT-2, it shows distinct biological, structural and chemical properties. Owing to its anti-oxidant properties and modulator function not only for Zn, but also for Cu in the extra- and intracellular space, MT-3, but not MT-1/MT-2, protects neuronal cells from the toxicity of various Cu(II)-bound amyloids. In recent years, the roles of zinc dynamics and MT-3 function in neurodegeneration are slowly emerging. This short review focuses on the recent developments regarding the chemistry and biology of MT-3.
Project description:The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence-structure-function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer.
Project description:Somatic cell reprogramming and tissue repair share relevant factors and molecular programs. Here, Dickkopf-3 (DKK3) is identified as novel factor for organ regeneration using combined transcription-factor-induced reprogramming and RNA-interference techniques. Loss of Dkk3 enhances the generation of induced pluripotent stem cells but does not affect de novo derivation of embryonic stem cells, three-germ-layer differentiation or colony formation capacity of liver and pancreatic organoids. However, DKK3 expression levels in wildtype animals and serum levels in human patients are elevated upon injury. Accordingly, Dkk3-null mice display less liver damage upon acute and chronic failure mediated by increased proliferation in hepatocytes and LGR5+ liver progenitor cell population, respectively. Similarly, recovery from experimental pancreatitis is accelerated. Regeneration onset occurs in the acinar compartment accompanied by virtually abolished canonical-Wnt-signaling in Dkk3-null animals. This results in reduced expression of the Hedgehog repressor Gli3 and increased Hedgehog-signaling activity upon Dkk3 loss. Collectively, these data reveal Dkk3 as a key regulator of organ regeneration via a direct, previously unacknowledged link between DKK3, canonical-Wnt-, and Hedgehog-signaling.
Project description:BACKGROUND: Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set. RESULTS: In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. CONCLUSION: We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at http://www.cnb.csic.es/~monica/coherence/