Project description:RNA-sequencing (RNA-seq) is widely used for analysis of alternative splicing, but in practice, has inherent biases which hinder its ability to detect and quantify splicing events. To address this, we present a targeted RNA-seq method that specifically enriches for splicing-informative junction-spanning reads. Local Splicing Variation sequencing (LSV-seq) utilizes multiplexed reverse transcription from highly scalable pools of primers anchored near splice junctions of interest. Primers are designed using Optimal Prime, a novel dedicated machine learning algorithm trained on the performance of thousands of primer sequences. LSV-seq achieves high on-target capture rates and concordance with RNA-seq, while requiring several-fold lower sequencing depth. We use LSV-seq to target events with low coverage in GTEx RNA-seq data and discover hundreds of previously hidden tissue-specific splicing events. Our results demonstrate the ability of LSV-seq to capture alternative splicing with exceptional sensitivity and highlight its potential to improve the detection of other RNA features of interest.
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:Stem cell organoids are powerful models for studying organ development, disease modeling, drug screening, and regenerative medicine applications. The convergence of organoid technology, tissue engineering, and artificial intelligence (AI) could potentially enhance our understanding of the design principle for organoid engineering. In this study, we utilized micropatterning techniques to create a designer library of 230 cardiac organoids with 7 geometric designs (Circle 200, Circle 600, Circle 1000, Rectangle 1:1, Rectangle 1:4, Star 1:1, and Star 1:4). We employed manifold learning techniques to analyze single organoid heterogeneity based on 10 physiological parameters. We successfully clustered and refined our cardiac organoids based on their functional similarity using unsupervised machine learning approaches, thus elucidating unique functionalities associated with geometric designs. We also highlighted the critical role of calcium rising time in distinguishing organoids based on geometric patterns and clustering results. This innovative integration of organoid engineering and machine learning enhances our understanding of structure-function relationships in cardiac organoids, paving the way for more controlled and optimized organoid design.
Project description:We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: (a) implication of three different normalization techniques, and (b) implication of differential analysis using the generalized linear model (GLM). We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.
Project description:We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: (a) implication of three different normalization techniques, and (b) implication of differential analysis using the generalized linear model (GLM). We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.
Project description:Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: despite providing accurate predictions, they cannot describe how they arrived at their predictions. Here, using an ``interpretable-by-design'' approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed novel components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.