Project description:The orexin neurohormones control a variety of important physiological processes by signaling through two related G protein-coupled receptors, including appetite and feeding, wakefulness and energy homeostasis. Pharmacological manipulation of orexin signaling is an important goal. Here we describe the isolation of orexin receptor ligands from a library of microarray-displayed peptoids via a novel two-color, cell-based screen. Functional analysis of derivatives of these "hits" resulted in the development of moderate potency, low molecular weight receptor antagonists. Moreover, further optimization efforts resulted in the fortuitous discovery of a compound that positively potentiates the activity of the receptor. This compound is the first small molecule reported to up-regulate orexin signaling.
Project description:Small noncoding RNAs (sRNA/sncRNAs) are generated from different genomic loci and play important roles in biological processes, such as cell proliferation and the regulation of gene expression. Next-generation sequencing (NGS) has provided an unprecedented opportunity to discover and quantify diverse kinds of sncRNA, such as tRFs (tRNA-derived small RNA fragments), phasiRNAs (phased, secondary, small-interfering RNAs), Piwi-interacting RNA (piRNAs) and plant-specific 24-nt short interfering RNAs (siRNAs). However, currently available web-based tools do not provide approaches to comprehensively analyze all of these diverse sncRNAs. This study presents a novel integrated platform, sRNAtools (https://bioinformatics.caf.ac.cn/sRNAtools), that can be used in conjunction with high-throughput sequencing to identify and functionally annotate sncRNAs, including profiling microRNAss, piRNAs, tRNAs, small nuclear RNAs, small nucleolar RNAs and rRNAs and discovering isomiRs, tRFs, phasiRNAs and plant-specific 24-nt siRNAs for up to 21 model organisms. Different modules, including single case, batch case, group case and target case, are developed to provide users with flexible ways of studying sncRNA. In addition, sRNAtools supports different ways of uploading small RNA sequencing data in a very interactive queue system, while local versions based on the program package/Docker/virtureBox are also available. We believe that sRNAtools will greatly benefit the scientific community as an integrated tool for studying sncRNAs.
Project description:Isolated or syndromic congenital cataracts are heterogeneous developmental defects, making the identification of the associated genes challenging. In the past, mouse lens expression microarrays have been successfully applied in bioinformatics tools (e.g., iSyTE) to facilitate human cataract-associated gene discovery. To develop a new resource for geneticists, we report high-throughput RNA sequencing (RNA-seq) profiles of mouse lens at key embryonic stages (E)10.5 (lens pit), E12.5 (primary fiber cell differentiation), E14.5 and E16.5 (secondary fiber cell differentiation). These stages capture important events as the lens develops from an invaginating placode into a transparent tissue. Previously, in silico whole-embryo body (WB)-subtraction-based "lens-enriched" expression has been effective in prioritizing cataract-linked genes. To apply an analogous approach, we generated new mouse WB RNA-seq datasets and show that in silico WB subtraction of lens RNA-seq datasets successfully identifies key genes based on lens-enriched expression. At ≥2 counts-per-million expression, ≥1.5 log2 fold-enrichment (p < 0.05) cutoff, E10.5 lens exhibits 1401 enriched genes (17% lens-expressed genes), E12.5 lens exhibits 1937 enriched genes (22% lens-expressed genes), E14.5 lens exhibits 2514 enriched genes (31% lens-expressed genes), and E16.5 lens exhibits 2745 enriched genes (34% lens-expressed genes). Biological pathway analysis identified genes associated with lens development, transcription regulation and signaling pathways, among other functional groups. Furthermore, these new RNA-seq data confirmed high expression of established cataract-linked genes and identified new potential regulators in the lens. Finally, we developed new lens stage-specific UCSC Genome Brower annotation tracks and made these publicly accessible through iSyTE ( https://research.bioinformatics.udel.edu/iSyTE/ ) for user-friendly visualization of lens gene expression/enrichment to prioritize genes from high-throughput data from cataract cases.
Project description:The study of RNA modifications in large clinical cohorts can reveal relationships between the epitranscriptome and human diseases, although this is especially challenging. We developed ModTect (https://github.com/ktan8/ModTect), a statistical framework to identify RNA modifications de novo by standard RNA-sequencing with deletion and mis-incorporation signals. We show that ModTect can identify both known (N 1-methyladenosine) and previously unknown types of mRNA modifications (N 2,N 2-dimethylguanosine) at nucleotide-resolution. Applying ModTect to 11,371 patient samples and 934 cell lines across 33 cancer types, we show that the epitranscriptome was dysregulated in patients across multiple cancer types and was additionally associated with cancer progression and survival outcomes. Some types of RNA modification were also more disrupted than others in patients with cancer. Moreover, RNA modifications contribute to multiple types of RNA-DNA sequence differences, which unexpectedly escape detection by Sanger sequencing. ModTect can thus be used to discover associations between RNA modifications and clinical outcomes in patient cohorts.
Project description:Analysis of the T-cell receptor (TCR) repertoire is essential to characterize the extensive collections of T-cell populations with recognizing antigens in cancer research, and whole transcriptome sequencing (WTS) and immune repertoire sequencing (IR-seq) are commonly used for this measure. To date, no standard read filtering method for IR measurement has been presented. We assessed the diversity of the TCR repertoire results from the paired WTS and IR-seq data of 31 multiple myeloma (MM) patients. To invent an adequate read filtering strategy for IR analysis, we conducted comparisons with WTS results. First, our analyses for determining an optimal threshold for selecting clonotypes showed that the clonotypes supported by a single read largely affected the shared clonotypes and manifested distinct patterns of mapping qualities, unlike clonotypes with multiple reads. Second, although IR-seq could reflect a wider TCR region with a higher capture rate than WTS, an adequate comparison with the removal of unwanted bias from potential sequencing errors was possible only after applying our read filtering strategy. As a result, we suggest that TCR repertoire analysis be carried out through IR-seq to produce reliable and accurate results, along with the removal of single-read clonotypes, to conduct immune research in cancer using high-throughput sequencing.
Project description:The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain.A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was "protein processing in endoplasmic reticulum". Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins.We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants were also found in candidate genes coding for major milk proteins and proteins related to milk fat metabolism. Several of the SNPs found in this study could be included as suitable markers in genotyping platforms or custom SNP arrays to perform association analyses in commercial populations and apply genomic selection protocols in the dairy production industry.
Project description:The lack of accessible and structured documentation creates major barriers for investigators interested in understanding, properly interpreting and analyzing cohort data and biological samples. Providing the scientific community with open information is essential to optimize usage of these resources. A cataloguing toolkit is proposed by Maelstrom Research to answer these needs and support the creation of comprehensive and user-friendly study- and network-specific web-based metadata catalogues.Development of the Maelstrom Research cataloguing toolkit was initiated in 2004. It was supported by the exploration of existing catalogues and standards, and guided by input from partner initiatives having used or pilot tested incremental versions of the toolkit.The cataloguing toolkit is built upon two main components: a metadata model and a suite of open-source software applications. The model sets out specific fields to describe study profiles; characteristics of the subpopulations of participants; timing and design of data collection events; and datasets/variables collected at each data collection event. It also includes the possibility to annotate variables with different classification schemes. When combined, the model and software support implementation of study and variable catalogues and provide a powerful search engine to facilitate data discovery.The Maelstrom Research cataloguing toolkit already serves several national and international initiatives and the suite of software is available to new initiatives through the Maelstrom Research website. With the support of new and existing partners, we hope to ensure regular improvements of the toolkit.
Project description:Background Limited knowledge and unclear underlying biology of many rare diseases pose significant challenges to patients, clinicians, and scientists. To address these challenges, there is an urgent need to inspire and encourage scientists to propose and pursue innovative research studies that aim to uncover the genetic and molecular causes of more rare diseases and ultimately to identify effective therapeutic solutions. A clear understanding of current research efforts, knowledge/research gaps, and funding patterns as scientific evidence is crucial to systematically accelerate the pace of research discovery in rare diseases, which is an overarching goal of this study. Methods To semantically represent NIH funding data for rare diseases and advance its use of effectively promoting rare disease research, we identified NIH funded projects for rare diseases by mapping GARD diseases to the project based on project titles; subsequently we presented and managed those identified projects in a knowledge graph using Neo4j software, hosted at NCATS, based on a pre-defined data model that captures semantics among the data. With this developed knowledge graph, we were able to perform several case studies to demonstrate scientific evidence generation for supporting rare disease research discovery. Results Of 5001 rare diseases belonging to 32 distinct disease categories, we identified 1294 diseases that are mapped to 45,647 distinct, NIH-funded projects obtained from the NIH ExPORTER by implementing semantic annotation of project titles. To capture semantic relationships presenting amongst mapped research funding data, we defined a data model comprised of seven primary classes and corresponding object and data properties. A Neo4j knowledge graph based on this predefined data model has been developed, and we performed multiple case studies over this knowledge graph to demonstrate its use in directing and promoting rare disease research. Conclusion We developed an integrative knowledge graph with rare disease funding data and demonstrated its use as a source from where we can effectively identify and generate scientific evidence to support rare disease research. With the success of this preliminary study, we plan to implement advanced computational approaches for analyzing more funding related data, e.g., project abstracts and PubMed article abstracts, and linking to other types of biomedical data to perform more sophisticated research gap analysis and identify opportunities for future research in rare diseases. Supplementary Information The online version contains supplementary material available at 10.1186/s13023-021-02120-9.
Project description:We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads. To overcome these difficulties, we use a log-linear model with a new approach to normalization. We derive a novel procedure to estimate the false discovery rate (FDR). Our method can be applied to data with quantitative, two-class, or multiple-class outcomes, and the computation is fast even for large data sets. We study the accuracy of our approaches for significance calculation and FDR estimation, and we demonstrate that our method has potential advantages over existing methods that are based on a Poisson or negative binomial model. In summary, this work provides a pipeline for the significance analysis of sequencing data.
Project description:We prepared small RNA libraries from 29 tumor/normal pairs of human cervical tissue samples. Analysis of the resulting sequences (42 million in total) defined 64 new human microRNA (miRNA) genes. Both arms of the hairpin precursor were observed in twenty-three of the newly identified miRNA candidates. We tested several computational approaches for analysis of class differences between high throughput sequencing datasets, and describe a novel application of log linear model that has provided the most datasets, and describe a novel application of log linear model that has provided the most effective analysis for this data. This method resulted in the identification of 67 miRNAs that were differentially-expressed between the tumor and normal samples at a false discovery rate less than 0.001. A total of 29 tumor/normal pairs of human cervical tissue samples were analyzed. Two samples (G699N_2 and G761T_2) were performed in duplicates. No Fastq files for GSM532871 to GSM532889, GSM532929, and GSM532930. Sequence files are provided as text files for these 22 Sample records in GSE20592_RAW.tar. 38 samples with quality scores are available from SRA as SRP002/SRP002326 (see Supplementary file below).