Project description:Automated segmentation of cellular electron microscopy (EM) datasets remains a challenge. Supervised deep learning (DL) methods that rely on region-of-interest (ROI) annotations yield models that fail to generalize to unrelated datasets. Newer unsupervised DL algorithms require relevant pre-training images, however, pre-training on currently available EM datasets is computationally expensive and shows little value for unseen biological contexts, as these datasets are large and homogeneous. To address this issue, we present CEM500K, a nimble 25 GB dataset of 0.5 × 106 unique 2D cellular EM images curated from nearly 600 three-dimensional (3D) and 10,000 two-dimensional (2D) images from >100 unrelated imaging projects. We show that models pre-trained on CEM500K learn features that are biologically relevant and resilient to meaningful image augmentations. Critically, we evaluate transfer learning from these pre-trained models on six publicly available and one newly derived benchmark segmentation task and report state-of-the-art results on each. We release the CEM500K dataset, pre-trained models and curation pipeline for model building and further expansion by the EM community. Data and code are available at https://www.ebi.ac.uk/pdbe/emdb/empiar/entry/10592/ and https://git.io/JLLTz.
Project description:MotivationThe inherent low contrast of electron microscopy (EM) datasets presents a significant challenge for rapid segmentation of cellular ultrastructures from EM data. This challenge is particularly prominent when working with high-resolution big-datasets that are now acquired using electron tomography and serial block-face imaging techniques. Deep learning (DL) methods offer an exciting opportunity to automate the segmentation process by learning from manual annotations of a small sample of EM data. While many DL methods are being rapidly adopted to segment EM data no benchmark analysis has been conducted on these methods to date.ResultsWe present EM-stellar, a platform that is hosted on Google Colab that can be used to benchmark the performance of a range of state-of-the-art DL methods on user-provided datasets. Using EM-stellar we show that the performance of any DL method is dependent on the properties of the images being segmented. It also follows that no single DL method performs consistently across all performance evaluation metrics.Availability and implementationEM-stellar (code and data) is written in Python and is freely available under MIT license on GitHub (https://github.com/cellsmb/em-stellar).Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:In this work, we present the CAS Landslide Dataset, a large-scale and multisensor dataset for deep learning-based landslide detection, developed by the Artificial Intelligence Group at the Institute of Mountain Hazards and Environment, Chinese Academy of Sciences (CAS). The dataset aims to address the challenges encountered in landslide recognition. With the increase in landslide occurrences due to climate change and earthquakes, there is a growing need for a precise and comprehensive dataset to support fast and efficient landslide recognition. In contrast to existing datasets with dataset size, coverage, sensor type and resolution limitations, the CAS Landslide Dataset comprises 20,865 images, integrating satellite and unmanned aerial vehicle data from nine regions. To ensure reliability and applicability, we establish a robust methodology to evaluate the dataset quality. We propose the use of the Landslide Dataset as a benchmark for the construction of landslide identification models and to facilitate the development of deep learning techniques. Researchers can leverage this dataset to obtain enhanced prediction, monitoring, and analysis capabilities, thereby advancing automated landslide detection.
Project description:PurposeThe curation of images using human resources is time intensive but an essential step for developing artificial intelligence (AI) algorithms. Our goal was to develop and implement an AI algorithm for image curation in a high-volume setting. We also explored AI tools that will assist in deploying a tiered approach, in which the AI model labels images and flags potential mislabels for human review.DesignImplementation of an AI algorithm.ParticipantsSeven-field stereoscopic images from multiple clinical trials.MethodsThe 7-field stereoscopic image protocol includes 7 pairs of images from various parts of the central retina along with images of the anterior part of the eye. All images were labeled for field number by reading center graders. The model output included classification of the retinal images into 8 field numbers. Probability scores (0-1) were generated to identify misclassified images, with 1 indicating a high probability of a correct label.Main outcome measuresAgreement of AI prediction with grader classification of field number and the use of probability scores to identify mislabeled images.ResultsThe AI model was trained and validated on 17 529 images and tested on 3004 images. The pooled agreement of field numbers between grader classification and the AI model was 88.3% (kappa, 0.87). The pooled mean probability score was 0.97 (standard deviation [SD], 0.08) for images for which the graders agreed with the AI-generated labels and 0.77 (SD, 0.19) for images for which the graders disagreed with the AI-generated labels (P < 0.0001). Using receiver operating characteristic curves, a probability score of 0.99 was identified as a cutoff for distinguishing mislabeled images. A tiered workflow using a probability score of < 0.99 as a cutoff would include 27.6% of the 3004 images for human review and reduce the error rate from 11.7% to 1.5%.ConclusionsThe implementation of AI algorithms requires measures in addition to model validation. Tools to flag potential errors in the labels generated by AI models will reduce inaccuracies, increase trust in the system, and provide data for continuous model development.
Project description:The development of ultrafast detectors for electron microscopy (EM) opens a new door to exploring dynamics of nanomaterials; however, it raises grand challenges for big data processing and storage. Here, we combine deep learning and temporal compressive sensing (TCS) to propose a novel EM big data compression strategy. Specifically, TCS is employed to compress sequential EM images into a single compressed measurement; an end-to-end deep learning network is leveraged to reconstruct the original images. Owing to the significantly improved compression efficiency and built-in denoising capability of the deep learning framework over conventional JPEG compression, compressed videos with a compression ratio of up to 30 can be reconstructed with high fidelity. Using this approach, considerable encoding power, memory, and transmission bandwidth can be saved, allowing it to be deployed to existing detectors. We anticipate the proposed technique will have far-reaching applications in edge computing for EM and other imaging techniques.
Project description:To accurately identify atoms on noisy transmission electron microscope images, a deep learning (DL) approach is employed to estimate the map of probabilities at each pixel for being an atom with element discernment. Thanks to a delicately-designed loss function and the ability to extract features, the proposed DL networks can be trained by a small dataset created from approximately 30 experimental images, each with a size of 256 × 256 pixels2. The accuracy and robustness of the network were verified by resolving the structural defects of graphene and polar structures in PbTiO3/SrTiO3 multilayers from both the general TEM images and their imitated images on which intensities of some pixels lost randomly. Such a network has the potential to identify atoms from very few images of beam-sensitive material and explosive images recorded in a dynamical atomic process. The idea of using a small-dataset-trained DL framework to resolve a specific problem may prove instructive for practical DL applications in various fields.
Project description:Focused ion beam-scanning electron microscopy (FIB-SEM) images can provide a detailed view of the cellular ultrastructure of tumor cells. A deeper understanding of their organization and interactions can shed light on cancer mechanisms and progression. However, the bottleneck in the analysis is the delineation of the cellular structures to enable quantitative measurements and analysis. We mitigated this limitation using deep learning to segment cells and subcellular ultrastructure in 3D FIB-SEM images of tumor biopsies obtained from patients with metastatic breast and pancreatic cancers. The ultrastructures, such as nuclei, nucleoli, mitochondria, endosomes, and lysosomes, are relatively better defined than their surroundings and can be segmented with high accuracy using a neural network trained with sparse manual labels. Cell segmentation, on the other hand, is much more challenging due to the lack of clear boundaries separating cells in the tissue. We adopted a multi-pronged approach combining detection, boundary propagation, and tracking for cell segmentation. Specifically, a neural network was employed to detect the intracellular space; optical flow was used to propagate cell boundaries across the z-stack from the nearest ground truth image in order to facilitate the separation of individual cells; finally, the filopodium-like protrusions were tracked to the main cells by calculating the intersection over union measure for all regions detected in consecutive images along z-stack and connecting regions with maximum overlap. The proposed cell segmentation methodology resulted in an average Dice score of 0.93. For nuclei, nucleoli, and mitochondria, the segmentation achieved Dice scores of 0.99, 0.98, and 0.86, respectively. The segmentation of FIB-SEM images will enable interpretative rendering and provide quantitative image features to be associated with relevant clinical variables.
Project description:Connectomics is a developing field aiming at reconstructing the connection of the neural system at the nanometer scale. Computer vision technology, especially deep learning methods used in image processing, has promoted connectomic data analysis to a new era. However, the performance of the state-of-the-art (SOTA) methods still falls behind the demand of scientific research. Inspired by the success of ImageNet, we present an annotated ultra-high resolution image segmentation dataset for cell membrane (U-RISC), which is the largest cell membrane-annotated electron microscopy (EM) dataset with a resolution of 2.18 nm/pixel. Multiple iterative annotations ensured the quality of the dataset. Through an open competition, we reveal that the performance of current deep learning methods still has a considerable gap from the human level, different from ISBI 2012, on which the performance of deep learning is closer to the human level. To explore the causes of this discrepancy, we analyze the neural networks with a visualization method, which is an attribution analysis. We find that the U-RISC requires a larger area around a pixel to predict whether the pixel belongs to the cell membrane or not. Finally, we integrate the currently available methods to provide a new benchmark (0.67, 10% higher than the leader of the competition, 0.61) for cell membrane segmentation on the U-RISC and propose some suggestions in developing deep learning algorithms. The U-RISC dataset and the deep learning codes used in this study are publicly available.
Project description:The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks.