Project description:Here we present miR-eCLIP analysis of AGO2 in HEK293 cells to address the small RNA repertoire and uncover their physiological targets. We developed an optimized bioinformatics approach of chimeric read identification to detect chimeras of high confidence, which were useed as an biologically validated input for miRBind, a deep learning method and web-server that can be used to accurately predict the potential of miRNA:target site binding.
Project description:The binding of microRNAs (miRNAs) to their target sites is a complex process, mediated by the Argonaute (Ago) family of proteins. The prediction of miRNA:target site binding is an important first step for any miRNA target prediction algorithm. To date, the potential for miRNA:target site binding is evaluated using either co-folding free energy measures or heuristic approaches, based on the identification of binding 'seeds', i.e., continuous stretches of binding corresponding to specific parts of the miRNA. The limitations of both these families of methods have produced generations of miRNA target prediction algorithms that are primarily focused on 'canonical' seed targets, even though unbiased experimental methods have shown that only approximately half of in vivo miRNA targets are 'canonical'. Herein, we present miRBind, a deep learning method and web server that can be used to accurately predict the potential of miRNA:target site binding. We trained our method using seed-agnostic experimental data and show that our method outperforms both seed-based approaches and co-fold free energy approaches. The full code for the development of miRBind and a freely accessible web server are freely available.
Project description:We developed a semi-supervised deep learning framework for the identification of doublets in scRNA-seq analysis called Solo. To validate our method, we used MULTI-seq, cholesterol modified oligos (CMOs), to experimentally identify doublets in a solid tissue with diverse cell types, mouse kidney, and showed Solo recapitulated experimentally identified doublets.
Project description:Fusarium head blight (FHB) incited by Fusarium graminearum Schwabe is a devastating disease of barley and other cereal crops worldwide. Fusarium head blight is associated with trichothecene mycotoxins such as deoxynivalenol (DON), where contaminated grains are unfit for malting or animal feed industries. While genetically resistant cultivars offer the best economic and environmentally responsible means to mitigate disease, parent lines with adequate resistance are limited in barley. Resistancebreeding based upon quantitative genetic gains has been slow to date, due to intensive labour requirements of disease nurseries. The development of high throughput genome-wide molecular markers, allow application in genomic prediction models. A diverse genomic panel consisting of 400 two-row spring barley lines was assembled to focus on Canadian barley breeding programs. The panel was evaluated for FHB and DON content in three environments and over two years. Moreover, it was genotyped using an Illumina Infinium HTS iSelect custom beadchip array of single nucleotide polymorphic molecular markers (50K SNP), where over 23K molecular markers were polymorphic. Genomic prediction has been successfully demonstrated for reducing FHB and DON content in cereals using various statistically-based models of different underlying assumptions. Herein, we have studied an alternative method basedon machine learning and compare it with a statistical approach. Two encoding techniques were utilized (categorical or Hardy-Weinberg frequencies), followed by selecting essential genomic markers for phenotype prediction. Subsequently, we applied a transformer-based deep learning algorithm to predict FHB and DON. Apart from the transformer method, we also implemented a Residual Fully Connected Neural Network (RFCNN). Pearson correlation coefficients were calculated to compare true vs. predicted outputs. Under most model scenarios, the use of all markers vs. selected markers marginally improved prediction performance except for RFCNN method for FHB (27.6%). Hardy-Weinberg encoding generally improved correlation for FHB (6.9%) and DON (9.6%) for transformer. This study suggests the potential of the transformer based method for genomic prediction of complex traits such as FHB or DON, having performed better or equally compared with existing machine learning and statistical method. To genomic prediction in barley for Fusarium head blight and deoxynivalenol content using a custom Illumina Infinium array (BarleySNP50-JHI) (www.illumina.com). Sample types included leaves from 400 barley genotypes mostly of Canadian origin. This series includes 400 genotypes assayed on an Illumina infinium HTS platform 50K BeadChip.
Project description:The organisation of the genome in nuclear space is an important frontier of biology. Chromosome conformation capture methods such as Hi-C and Micro-C produce genome-wide chromatin contact maps that provide rich data containing quantitative and qualitative information about genome architecture. Most conventional approaches to genome-wide chromosome conformation capture data are limited to the analysis of pre-defined features, and may therefore miss important biological information. One constraint is that biologically important features can be masked by high levels of technical noise in the data. Here we introduce Twins, a replicate-based method for deep learning from chromatin conformation contact maps. Using a Siamese network configuration, Twins learns to distinguish technical noise from biological variation and outperforms image similarity metrics across a range of biological systems. Features extracted by Twins from Hi-C maps after perturbation of cohesin and CTCF reflect the distinct biological functions of cohesin and CTCF in the formation of domains and boundaries, respectively. Twins distance metrics are biologically meaningful, as they mirror the density of cohesin and CTCF binding. Taken together, these properties make Twins an powerful tool for the exploration of chromosome conformation capture data, such as Hi-C capture Hi-C, and Micro-C.
Project description:We introduce Affinity Distillation (AD), a method for extracting thermodynamic affinities de-novo from in-vivo immunoprecipitation experiments using deep learning. We show that neural networks modeling base-resolution in-vivo binding profiles of yeast and mammalian TFs can accurately predict energetic impacts of varying underlying DNA sequence on TF binding. Systematic comparisons between Affinity Distillation predictions and other predictive algorithms consistently show that Affinity Distillation more accurately predicts affinities across a wide range of TF structural classes and DNA sequences. Affinity Distillation relies on in-silico marginalization against many sequence backgrounds, resulting in a higher dynamic range and more accurate predictions than motif discovery algorithms. Moreover, we show that Affinity Distillation can learn differential paralog-specific affinities, thereby making it possible to more accurately reconstruct regulatory networks in cells.
Project description:Engineered RNA elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids. Predicting the behavior of these tools remains a challenge, a situation that could be addressed through enhanced pattern recognition from deep learning. Thus, we investigate Deep Neural Networks (DNN) to predict toehold switch function as a canonical riboswitch model in synthetic biology. To facilitate DNN training, we synthesized and characterized in vivo a dataset of 91,534 toehold switches spanning 23 viral genomes and 906 human transcription factors. DNNs trained on nucleotide sequences outperformed (R2=0.43-0.70) previous state-of-the-art thermodynamic and kinetic models (R2=0.04-0.15) and allowed for human-understandable attention-visualizations (VIS4Map) to identify success and failure modes. This deep learning approach constitutes a major step forward in engineering and understanding of RNA synthetic biology.
Project description:To extract urinary proteome spectral features based on advanced mass spectrometer and machine learning algorithms, it could get more accurate reporting results for disease classification. We tried to establish a novel diagnosis model of kidney diseases by combining machine learning XGBoost algorithm with complete urinary proteomic information.