Project description:The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS2. The L1000CDS2 search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS2 search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS2 to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS2, we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS2 we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection in vitro without causing cellular toxicity in human cell lines. In summary, the L1000CDS2 tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource.
Project description:For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites.
Project description:SummaryAccessing the collection of perturbed gene expression profiles, such as the LINCS L1000 connectivity map, is usually performed at the individual dataset level, followed by a summary performed by counting individual hits for each perturbagen. With the metaLINCS R package, we present an alternative approach that combines rank correlation and gene set enrichment analysis to identify meta-level enrichment at the perturbagen level and, in the case of drugs, at the mechanism of action level. This significantly simplifies the interpretation and highlights overarching themes in the data. We demonstrate the functionality of the package and compare its performance against those of three currently used approaches.Availability and implementationmetaLINCS is released under GPL3 license. Source code and documentation are freely available on GitHub (https://github.com/bigomics/metaLINCS).Supplementary informationSupplementary data are available at Bioinformatics Advances online.
Project description:Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of published benchmark tasks and well characterized baselines. In this work, we establish such benchmarks and baselines by profiling many classifiers against biologically motivated tasks on two curated views of a large, public gene expression dataset (the LINCS corpus) and one privately produced dataset. We provide these two curated views of the public LINCS dataset and our benchmark tasks to enable direct comparisons to future methodological work and help spur deep learning method development on this modality. In addition to profiling a battery of traditional classifiers, including linear models, random forests, decision trees, K nearest neighbor (KNN) classifiers, and feed-forward artificial neural networks (FF-ANNs), we also test a method novel to this data modality: graph convolugtional neural networks (GCNNs), which allow us to incorporate prior biological domain knowledge. We find that GCNNs can be highly performant, with large datasets, whereas FF-ANNs consistently perform well. Non-neural classifiers are dominated by linear models and KNN classifiers.
Project description:The Library of Integrated Network-based Cellular Signatures (LINCS) L1000 big data provide gene expression profiles induced by over 10 000 compounds, shRNAs, and kinase inhibitors using the L1000 platform. We developed csNMF, a systematic compound signature discovery pipeline covering from raw L1000 data processing to drug screening and mechanism generation. The csNMF pipeline demonstrated better performance than the original L1000 pipeline. The discovered compound signatures of breast cancer were consistent with the LINCS KINOMEscan data and were clinically relevant. The csNMF pipeline provided a novel and complete tool to expedite signature-based drug discovery leveraging the LINCS L1000 resources.
Project description:MotivationAdverse drug reactions (ADRs) are a central consideration during drug development. Here we present a machine learning classifier to prioritize ADRs for approved drugs and pre-clinical small-molecule compounds by combining chemical structure (CS) and gene expression (GE) features. The GE data is from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset that measured changes in GE before and after treatment of human cells with over 20 000 small-molecule compounds including most of the FDA-approved drugs. Using various benchmarking methods, we show that the integration of GE data with the CS of the drugs can significantly improve the predictability of ADRs. Moreover, transforming GE features to enrichment vectors of biological terms further improves the predictive capability of the classifiers. The most predictive biological-term features can assist in understanding the drug mechanisms of action. Finally, we applied the classifier to all >20 000 small-molecules profiled, and developed a web portal for browsing and searching predictive small-molecule/ADR connections.Availability and implementationThe interface for the adverse event predictions for the >20 000 LINCS compounds is available at http://maayanlab.net/SEP-L1000/ CONTACT: avi.maayan@mssm.eduSupplementary informationSupplementary data are available at Bioinformatics online.
Project description:A powerful means to understand the cellular function of corrupt oncogenic signaling programs requires perturbing the system and monitoring the downstream consequences. Here, using a unique pair of non-small cell lung cancer (NSCLC)/normal lung epithelial patient-derived cell lines (HCC4017/HBEC30KT), we systematically interrogated the remodeling of the NSCLC proteome upon treatment with 35 chemical perturbagens targeting a diverse array of mechanistic classes. HCC4017 and HBEC30KT cells differ significantly in their proteomic response to the same compound treatment. Using protein covariance analyses, we identified a large number of functional protein networks. For example, we found that a poorly studied protein, C5orf22, is a novel component of the WBP11/PQBP1 splicing complex. Depletion of C5orf22 leads to the aberrant splicing and expression of genes involved in cell growth and immunomodulation. In summary, we show that by systematically measuring the tumor adaptive responses at the proteomic level, an understanding could be generated that provides critical circuit-level biological insights for these pharmacologic perturbagens.
Project description:The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles. Such a dataset is invaluable for the discovery of drug and target candidates and for inferring mechanisms of action for small molecules. The L1000 assay only measures the mRNA expression of 978 landmark genes while 11,350 additional genes are computationally reliably inferred. The lack of full genome coverage limits knowledge discovery for half of the human protein coding genes, and the potential for integration with other transcriptomics profiling data. Here we present a Deep Learning two-step model that transforms L1000 profiles to RNA-seq-like profiles. The input to the model are the measured 978 landmark genes while the output is a vector of 23,614 RNA-seq-like gene expression profiles. The model first transforms the landmark genes into RNA-seq-like 978 gene profiles using a modified CycleGAN model applied to unpaired data. The transformed 978 RNA-seq-like landmark genes are then extrapolated into the full genome space with a fully connected neural network model. The two-step model achieves 0.914 Pearson's correlation coefficients and 1.167 root mean square errors when tested on a published paired L1000/RNA-seq dataset produced by the LINCS and GTEx programs. The processed RNA-seq-like profiles are made available for download, signature search, and gene centric reverse search with unique case studies.
Project description:Globally, nearly 40 percent of all diabetic patients develop serious diabetic kidney disease (DKD). The identification of the potential early-stage biomarkers and elucidation of their underlying molecular mechanisms in DKD are required. In this study, we performed integrated bioinformatics analysis on the expression profiles GSE111154, GSE30528 and GSE30529 associated with early diabetic nephropathy (EDN), glomerular DKD (GDKD) and tubular DKD (TDKD), respectively. A total of 1,241, 318 and 280 differentially expressed genes (DEGs) were identified for GSE30258, GSE30529, and GSE111154 respectively. Subsequently, 280 upregulated and 27 downregulated DEGs shared between the three GSE datasets were identified. Further analysis of the gene expression levels conducted on the hub genes revealed SPARC (Secreted Protein Acidic And Cysteine Rich), POSTN (periostin), LUM (Lumican), KNG1 (Kininogen 1), FN1 (Fibronectin 1), VCAN (Versican) and PTPRO (Protein Tyrosine Phosphatase Receptor Type O) having potential roles in DKD progression. FN1, LUM and VCAN were identified as upregulated genes for GDKD whereas the downregulation of PTPRO was associated with all three diseases. Both POSTN and SPARC were identified as the overexpressed putative biomarkers whereas KNG1 was found as downregulated in TDKD. Additionally, we also identified two drugs, namely pidorubicine, a topoisomerase inhibitor (LINCS ID- BRD-K04548931) and Polo-like kinase inhibitor (LINCS ID- BRD-K41652870) having the validated role in reversing the differential gene expression patterns observed in the three GSE datasets used. Collectively, this study aids in the understanding of the molecular drivers, critical genes and pathways that underlie DKD initiation and progression.