A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia [RNA-Seq]
Ontology highlight
ABSTRACT: We demonstrate a promising approach to identify robust molecular markers for targeted treatment of acute myeloid leukemia. We show that our method outperforms several state-of-the-art approaches in identifying molecular markers replicated in validation data and predicting drug sensitivity accurately. Finally, we identify SMARCA4 as a marker and driver of sensitivity to topoisomerase II inhibitors, mitoxantrone and etoposide, in AML by showing that cell lines transduced to have high SMARCA4 expression reveal dramatically increased sensitivity to these agents.
Project description:We demonstrate a promising approach to identify robust molecular markers for targeted treatment of acute myeloid leukemia. We show that our method outperforms several state-of-the-art approaches in identifying molecular markers replicated in validation data and predicting drug sensitivity accurately. Finally, we identify SMARCA4 as a marker and driver of sensitivity to topoisomerase II inhibitors, mitoxantrone and etoposide, in AML by showing that cell lines transduced to have high SMARCA4 expression reveal dramatically increased sensitivity to these agents.
Project description:Cancers that appear pathologically similar often respond differently to the same drug regimens. Methods to better match patients to drugs are in high demand. We demonstrate a promising approach to identify robust molecular markers for targeted treatment of acute myeloid leukemia (AML) by introducing: data from 30 AML patients including genome-wide gene expression profiles and in vitro sensitivity to 160 chemotherapy drugs, a computational method to identify reliable gene expression markers for drug sensitivity by incorporating multi-omic prior information relevant to each gene's potential to drive cancer. We show that our method outperforms several state-of-the-art approaches in identifying molecular markers replicated in validation data and predicting drug sensitivity accurately. Finally, we identify SMARCA4 as a marker and driver of sensitivity to topoisomerase II inhibitors, mitoxantrone, and etoposide, in AML by showing that cell lines transduced to have high SMARCA4 expression reveal dramatically increased sensitivity to these agents.
Project description:Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations.
Project description:Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.
Project description:The aim of this study is to analyze the change in genome wide expression levels in HAP1 cells upon loss of SMARCB1, SMARCA4 or both these genes together. The SMARCB1 and SMARCA4 genes were the hits from a genome wide screen involving genetrap mutagenesis to find new players that are involved in sensitivity to Doxorubicin (Dox). It was found that loss of SMARCB1 and SMARCA4 genes impart resistance in HAP1 cells to Dox. To validate this, the genes were knocked out in HAP1 cells with CRISPR-Cas9 technology. Gene expression levels in SMARCB1 null, SMARCA4 null and SMARCB1-SMARCA4 double null cells were compared to wildtype HAP1 cells using RNAseq. From these experiments it was found that SMARCB1 loss caused several fold increase in ABCB1 gene levels. ABCB1 is an efflux pump in cells responsible for flushing out many small-molecule drugs. Further analysis of this gene confirmed that ABCB1 was the main factor responsible for Dox resistance upon SMARCB1 loss. In total there are four different cell types with two replicates for each cell type. Therefore, 8 samples in total.
Project description:This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford. Such 'Big Scientific Data' comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility and the UK's Central Laser Facility. Increasingly, scientists are now required to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, deep learning has made dramatic breakthroughs. Google's DeepMind has now used the deep learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, it has been able to achieve some spectacular results for this specific scientific problem. Can deep learning be similarly transformative for other scientific problems? After a brief review of some initial applications of machine learning at the RAL, we focus on challenges and opportunities for AI in advancing materials science. Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from several different scientific domains. We conclude with some initial examples of our 'scientific machine learning' benchmark suite and of the research challenges these benchmarks will enable. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.