Project description:We developed a semi-supervised deep learning framework for the identification of doublets in scRNA-seq analysis called Solo. To validate our method, we used MULTI-seq, cholesterol modified oligos (CMOs), to experimentally identify doublets in a solid tissue with diverse cell types, mouse kidney, and showed Solo recapitulated experimentally identified doublets.
Project description:The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827.
Project description:Deep learning and computer vision algorithms can deliver highly accurate and automated interpretation of medical imaging to augment and assist clinicians. However, medical imaging presents uniquely pertinent obstacles such as a lack of accessible data or a high-cost of annotation. To address this, we developed data-efficient deep learning classifiers for prediction tasks in cardiology. Using pipeline supervised models to focus relevant structures, we achieve an accuracy of 94.4% for 15-view still-image echocardiographic view classification and 91.2% accuracy for binary left ventricular hypertrophy classification. We then develop semi-supervised generative adversarial network models that can learn from both labeled and unlabeled data in a generalizable fashion. We achieve greater than 80% accuracy in view classification with only 4% of labeled data used in solely supervised techniques and achieve 92.3% accuracy for left ventricular hypertrophy classification. In exploring trade-offs between model type, resolution, data resources, and performance, we present a comprehensive analysis and improvements of efficient deep learning solutions for medical imaging assessment especially in cardiology.
Project description:Disease classification based on machine learning has become a crucial research topic in the fields of genetics and molecular biology. Generally, disease classification involves a supervised learning style; i.e., it requires a large number of labelled samples to achieve good classification performance. However, in the majority of the cases, labelled samples are hard to obtain, so the amount of training data are limited. However, many unclassified (unlabelled) sequences have been deposited in public databases, which may help the training procedure. This method is called semi-supervised learning and is very useful in many applications. Self-training can be implemented using high- to low-confidence samples to prevent noisy samples from affecting the robustness of semi-supervised learning in the training process. The deep forest method with the hyperparameter settings used in this paper can achieve excellent performance. Therefore, in this work, we propose a novel combined deep learning model and semi-supervised learning with self-training approach to improve the performance in disease classification, which utilizes unlabelled samples to update a mechanism designed to increase the number of high-confidence pseudo-labelled samples. The experimental results show that our proposed model can achieve good performance in disease classification and disease-causing gene identification.
Project description:Investigating the 3D structures and rearrangements of organelles within a single cell is critical for better characterizing cellular function. Imaging approaches such as soft X-ray tomography have been widely applied to reveal a complex subcellular organization involving multiple inter-organelle interactions. However, 3D segmentation of organelle instances has been challenging despite its importance in organelle characterization. Here we propose an intensity-based post-processing tool to identify and separate organelle instances. Our tool separates sphere-like (insulin vesicle) and columnar-shaped organelle instances (mitochondrion) based on the intensity of raw tomograms, semantic segmentation masks, and organelle morphology. We validate our tool using synthetic tomograms of organelles and experimental tomograms of pancreatic β-cells to separate insulin vesicle and mitochondria instances. As compared to the commonly used connected regions labeling, watershed, and watershed + Gaussian filter methods, our tool results in improved accuracy in identifying organelles in the synthetic tomograms and an improved description of organelle structures in β-cell tomograms. In addition, under different experimental treatment conditions, significant changes in volumes and intensities of both insulin vesicle and mitochondrion are observed in our instance results, revealing their potential roles in maintaining normal β-cell function. Our tool is expected to be applicable for improving the instance segmentation of other images obtained from different cell types using multiple imaging modalities.
Project description:BackgroundWith the development of modern sequencing technology, hundreds of thousands of single-cell RNA-sequencing (scRNA-seq) profiles allow to explore the heterogeneity in the cell level, but it faces the challenges of high dimensions and high sparsity. Dimensionality reduction is essential for downstream analysis, such as clustering to identify cell subpopulations. Usually, dimensionality reduction follows unsupervised approach.ResultsIn this paper, we introduce a semi-supervised dimensionality reduction method named scSemiAE, which is based on an autoencoder model. It transfers the information contained in available datasets with cell subpopulation labels to guide the search of better low-dimensional representations, which can ease further analysis.ConclusionsExperiments on five public datasets show that, scSemiAE outperforms both unsupervised and semi-supervised baselines whether the transferred information embodied in the number of labeled cells and labeled cell subpopulations is much or less.
Project description:With a spatial resolution of tens of microns, ultrasound localization microscopy (ULM) reconstructs microvascular structures and measures intravascular flows by tracking microbubbles (1-5 μm) in contrast enhanced ultrasound (CEUS) images. Since the size of CEUS bubble traces, e.g. 0.5-1 mm for ultrasound with a wavelength λ = 280 μm, is typically two orders of magnitude larger than the bubble diameter, accurately localizing microbubbles in noisy CEUS data is vital to the fidelity of the ULM results. In this paper, we introduce a residual learning based supervised super-resolution blind deconvolution network (SupBD-net), and a new loss function for a self-supervised blind deconvolution network (SelfBD-net), for detecting bubble centers at a spatial resolution finer than λ/10. Our ultimate purpose is to improve the ability to distinguish closely located microvessels and the accuracy of the velocity profile measurements in macrovessels. Using realistic synthetic data, the performance of these methods is calibrated and compared against several recently introduced deep learning and blind deconvolution techniques. For bubble detection, errors in bubble center location increase with the trace size, noise level, and bubble concentration. For all cases, SupBD-net yields the least error, keeping it below 0.1 λ. For unknown bubble trace morphology, where all the supervised learning methods fail, SelfBD-net can still maintain an error of less than 0.15 λ. SupBD-net also outperforms the other methods in separating closely located bubbles and parallel microvessels. In macrovessels, SupBD-net maintains the least errors in the vessel radius and velocity profile after introducing a procedure that corrects for terminated tracks caused by overlapping traces. Application of these methods is demonstrated by mapping the cerebral microvasculature of a neonatal pig, where neighboring microvessels separated by 0.15 λ can be readily distinguished by SupBD-net and SelfBD-net, but not by the other techniques. Hence, the newly proposed residual learning based methods improve the spatial resolution and accuracy of ULM in micro- and macro-vessels.
Project description:Deep cerebellar nuclei are a key structure of the cerebellum that are involved in processing motor and sensory information. It is thus a crucial step to accurately segment deep cerebellar nuclei for the understanding of the cerebellum system and its utility in deep brain stimulation treatment. However, it is challenging to clearly visualize such small nuclei under standard clinical magnetic resonance imaging (MRI) protocols and therefore precise segmentation is not feasible. Recent advances in 7 Tesla (T) MRI technology and great potential of deep neural networks facilitate automatic patient-specific segmentation. In this paper, we propose a novel deep learning framework (referred to as DCN-Net) for fast, accurate, and robust patient-specific segmentation of deep cerebellar dentate and interposed nuclei on 7T diffusion MRI. DCN-Net effectively encodes contextual information on the patch images without consecutive pooling operations and adding complexity via proposed dilated dense blocks. During the end-to-end training, label probabilities of dentate and interposed nuclei are independently learned with a hybrid loss, handling highly imbalanced data. Finally, we utilize self-training strategies to cope with the problem of limited labeled data. To this end, auxiliary dentate and interposed nuclei labels are created on unlabeled data by using DCN-Net trained on manual labels. We validate the proposed framework using 7T B0 MRIs from 60 subjects. Experimental results demonstrate that DCN-Net provides better segmentation than atlas-based deep cerebellar nuclei segmentation tools and other state-of-the-art deep neural networks in terms of accuracy and consistency. We further prove the effectiveness of the proposed components within DCN-Net in dentate and interposed nuclei segmentation.
Project description:Reference-based methods have dominated the approaches to the particle selection problem, proving fast, and accurate on even the most challenging micrographs. A reference volume, however, is not always available and compiling a set of reference projections from the micrographs themselves requires significant effort to attain the same level of accuracy. We propose a reference-free method to quickly extract particles from the micrograph. The method is augmented with a new semi-supervised machine-learning algorithm to accurately discriminate particles from contaminants and noise.