Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:Stem cell organoids are powerful models for studying organ development, disease modeling, drug screening, and regenerative medicine applications. The convergence of organoid technology, tissue engineering, and artificial intelligence (AI) could potentially enhance our understanding of the design principle for organoid engineering. In this study, we utilized micropatterning techniques to create a designer library of 230 cardiac organoids with 7 geometric designs (Circle 200, Circle 600, Circle 1000, Rectangle 1:1, Rectangle 1:4, Star 1:1, and Star 1:4). We employed manifold learning techniques to analyze single organoid heterogeneity based on 10 physiological parameters. We successfully clustered and refined our cardiac organoids based on their functional similarity using unsupervised machine learning approaches, thus elucidating unique functionalities associated with geometric designs. We also highlighted the critical role of calcium rising time in distinguishing organoids based on geometric patterns and clustering results. This innovative integration of organoid engineering and machine learning enhances our understanding of structure-function relationships in cardiac organoids, paving the way for more controlled and optimized organoid design.
Project description:We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: (a) implication of three different normalization techniques, and (b) implication of differential analysis using the generalized linear model (GLM). We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.
Project description:We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: (a) implication of three different normalization techniques, and (b) implication of differential analysis using the generalized linear model (GLM). We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.
Project description:Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: despite providing accurate predictions, they cannot describe how they arrived at their predictions. Here, using an ``interpretable-by-design'' approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed novel components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.
Project description:A continuum of macrophage polarization states is essential tissue homeostasis. We used machine learning approaches to identify universally relevant definition of macrophage polarization states and create a predictive framework for developing macrophage-targeted precision diagnostics and therapeutics. CCDC88A was identified as a key gene in the continuum state clusters that is essential for the tolerant polarization state.