Project description:Although acknowledged to be variable and subjective, manual annotation of cryo-electron tomography data is commonly used to answer structural questions and to create a "ground truth" for evaluation of automated segmentation algorithms. Validation of such annotation is lacking, but is critical for understanding the reproducibility of manual annotations. Here, we used voxel-based similarity scores for a variety of specimens, ranging in complexity and segmented by several annotators, to quantify the variation among their annotations. In addition, we have identified procedures for merging annotations to reduce variability, thereby increasing the reliability of manual annotation. Based on our analyses, we find that it is necessary to combine multiple manual annotations to increase the confidence level for answering structural questions. We also make recommendations to guide algorithm development for automated annotation of features of interest.
Project description:MotivationCryo electron tomography (CryoET) produces 3D density maps of biological specimen in its near native states. Applied to small cells, cryoET produces 3D snapshots of the cellular distributions of large complexes. However, retrieving this information is non-trivial due to the low resolution and low signal-to-noise ratio in tomograms. Current pattern recognition methods identify complexes by matching known structures to the cryo electron tomogram. However, so far only a small fraction of all protein complexes have been structurally resolved. It is, therefore, of great importance to develop template-free methods for the discovery of previously unknown protein complexes in cryo electron tomograms.ResultsHere, we have developed an inference method for the template-free discovery of frequently occurring protein complexes in cryo electron tomograms. We provide a first proof-of-principle of the approach and assess its applicability using realistically simulated tomograms, allowing for the inclusion of noise and distortions due to missing wedge and electron optical factors. Our method is a step toward the template-free discovery of the shapes, abundance and spatial distributions of previously unknown macromolecular complexes in whole cell tomograms.Contactalber@usc.edu
Project description:BackgroundDespite recent advances in cellular cryo-electron tomography (CET), developing automated tools for macromolecule identification in submolecular resolution remains challenging due to the lack of annotated data and high structural complexities. To date, the extent of the deep learning methods constructed for this problem is limited to conventional Convolutional Neural Networks (CNNs). Identifying macromolecules of different types and sizes is a tedious and time-consuming task. In this paper, we employ a capsule-based architecture to automate the task of macromolecule identification, that we refer to as 3D-UCaps. In particular, the architecture is composed of three components: feature extractor, capsule encoder, and CNN decoder. The feature extractor converts voxel intensities of input sub-tomograms to activities of local features. The encoder is a 3D Capsule Network (CapsNet) that takes local features to generate a low-dimensional representation of the input. Then, a 3D CNN decoder reconstructs the sub-tomograms from the given representation by upsampling.ResultsWe performed binary and multi-class localization and identification tasks on synthetic and experimental data. We observed that the 3D-UNet and the 3D-UCaps had an [Formula: see text]score mostly above 60% and 70%, respectively, on the test data. In both network architectures, we observed degradation of at least 40% in the [Formula: see text]-score when identifying very small particles (PDB entry 3GL1) compared to a large particle (PDB entry 4D8Q). In the multi-class identification task of experimental data, 3D-UCaps had an [Formula: see text]-score of 91% on the test data in contrast to 64% of the 3D-UNet. The better [Formula: see text]-score of 3D-UCaps compared to 3D-UNet is obtained by a higher precision score. We speculate this to be due to the capsule network employed in the encoder. To study the effect of the CapsNet-based encoder architecture further, we performed an ablation study and perceived that the [Formula: see text]-score is boosted as network depth is increased which is in contrast to the previously reported results for the 3D-UNet. To present a reproducible work, source code, trained models, data as well as visualization results are made publicly available.ConclusionQuantitative and qualitative results show that 3D-UCaps successfully perform various downstream tasks including identification and localization of macromolecules and can at least compete with CNN architectures for this task. Given that the capsule layers extract both the existence probability and the orientation of the molecules, this architecture has the potential to lead to representations of the data that are better interpretable than those of 3D-UNet.
Project description:Cryo-electron tomography (cryo-ET) allows one to observe macromolecular complexes in their native, spatially contextualized environment. Tools to visualize such complexes at nanometer resolution via iterative alignment and averaging are well-developed but rely on assumptions of structural homogeneity among the complexes under consideration. Recently developed downstream analysis tools allow for some assessment of macromolecular diversity but have limited capacity to represent highly heterogeneous macromolecules, including those undergoing continuous conformational changes. Here, we extend the highly expressive cryoDRGN deep learning architecture, originally created for cryo-electron microscopy single particle analysis, to sub-tomograms. Our new tool, tomoDRGN, learns a continuous low-dimensional representation of structural heterogeneity in cryo-ET datasets while also learning to reconstruct a large, heterogeneous ensemble of structures supported by the underlying data. Using simulated and experimental data, we describe and benchmark architectural choices within tomoDRGN that are uniquely necessitated and enabled by cryo-ET data. We additionally illustrate tomoDRGN's efficacy in analyzing an exemplar dataset, using it to reveal extensive structural heterogeneity among ribosomes imaged in situ.
Project description:Cellular electron cryotomography offers researchers the ability to observe macromolecules frozen in action in situ, but a primary challenge with this technique is identifying molecular components within the crowded cellular environment. We introduce a method that uses neural networks to dramatically reduce the time and human effort required for subcellular annotation and feature extraction. Subsequent subtomogram classification and averaging yield in situ structures of molecular components of interest. The method is available in the EMAN2.2 software package.
Project description:Cellular processes are governed by macromolecular complexes inside the cell. Study of the native structures of macromolecular complexes has been extremely difficult due to lack of data. With recent breakthroughs in Cellular Electron Cryo-Tomography (CECT) 3D imaging technology, it is now possible for researchers to gain accesses to fully study and understand the macro-molecular structures single cells. However, systematic recovery of macromolecular structures from CECT is very difficult due to high degree of structural complexity and practical imaging limitations. Specifically, we proposed a deep learning-based image classification approach for large-scale systematic macromolecular structure separation from CECT data. However, our previous work was only a very initial step toward exploration of the full potential of deep learning-based macromolecule separation. In this paper, we focus on improving classification performance by proposing three newly designed individual CNN models: an extended version of (Deep Small Receptive Field) DSRF3D, donated as DSRF3D-v2, a 3D residual block-based neural network, named as RB3D, and a convolutional 3D (C3D)-based model, CB3D. We compare them with our previously developed model (DSRF3D) on 12 datasets with different SNRs and tilt angle ranges. The experiments show that our new models achieved significantly higher classification accuracies. The accuracies are not only higher than 0.9 on normal datasets, but also demonstrate potentials to operate on datasets with high levels of noises and missing wedge effects presented.
Project description:Cryo-electron tomography (cryo-ET) provides 3D visualization of subcellular components in the near-native state and at sub-molecular resolutions in single cells, demonstrating an increasingly important role in structural biology in situ. However, systematic recognition and recovery of macromolecular structures in cryo-ET data remain challenging as a result of low signal-to-noise ratio (SNR), small sizes of macromolecules, and high complexity of the cellular environment. Subtomogram structural classification is an essential step for such task. Although acquisition of large amounts of subtomograms is no longer an obstacle due to advances in automation of data collection, obtaining the same number of structural labels is both computation and labor intensive. On the other hand, existing deep learning based supervised classification approaches are highly demanding on labeled data and have limited ability to learn about new structures rapidly from data containing very few labels of such new structures. In this work, we propose a novel approach for subtomogram classification based on few-shot learning. With our approach, classification of unseen structures in the training data can be conducted given few labeled samples in test data through instance embedding. Experiments were performed on both simulated and real datasets. Our experimental results show that we can make inference on new structures given only five labeled samples for each class with a competitive accuracy (> 0.86 on the simulated dataset with SNR = 0.1), or even one sample with an accuracy of 0.7644. The results on real datasets are also promising with accuracy > 0.9 on both conditions and even up to 1 on one of the real datasets. Our approach achieves significant improvement compared with the baseline method and has strong capabilities of generalizing to other cellular components.
Project description:Cryogenic-electron tomography enables the visualization of cellular environments in extreme detail, however, tools to analyze the full amount of information contained within these densely packed volumes are still needed. Detailed analysis of macromolecules through subtomogram averaging requires particles to first be localized within the tomogram volume, a task complicated by several factors including a low signal to noise ratio and crowding of the cellular space. Available methods for this task suffer either from being error prone or requiring manual annotation of training data. To assist in this crucial particle picking step, we present TomoTwin: an open source general picking model for cryogenic-electron tomograms based on deep metric learning. By embedding tomograms in an information-rich, high-dimensional space that separates macromolecules according to their three-dimensional structure, TomoTwin allows users to identify proteins in tomograms de novo without manually creating training data or retraining the network to locate new proteins.
Project description:Cellular Electron CryoTomography (CECT) enables 3D visualization of cellular organization at near-native state and in sub-molecular resolution, making it a powerful tool for analyzing structures of macromolecular complexes and their spatial organizations inside single cells. However, high degree of structural complexity together with practical imaging limitations makes the systematic de novo discovery of structures within cells challenging. It would likely require averaging and classifying millions of subtomograms potentially containing hundreds of highly heterogeneous structural classes. Although it is no longer difficult to acquire CECT data containing such amount of subtomograms due to advances in data acquisition automation, existing computational approaches have very limited scalability or discrimination ability, making them incapable of processing such amount of data.To complement existing approaches, in this article we propose a new approach for subdividing subtomograms into smaller but relatively homogeneous subsets. The structures in these subsets can then be separately recovered using existing computation intensive methods. Our approach is based on supervised structural feature extraction using deep learning, in combination with unsupervised clustering and reference-free classification. Our experiments show that, compared with existing unsupervised rotation invariant feature and pose-normalization based approaches, our new approach achieves significant improvements in both discrimination ability and scalability. More importantly, our new approach is able to discover new structural classes and recover structures that do not exist in training data.Source code freely available at http://www.cs.cmu.edu/?mxu1/software .mxu1@cs.cmu.edu.Supplementary data are available at Bioinformatics online.
Project description:JCVI-syn3A is a genetically minimal bacterial cell, consisting of 493 genes and only a single 543 kbp circular chromosome. Syn3A's genome and physical size are approximately one-tenth those of the model bacterial organism Escherichia coli's, and the corresponding reduction in complexity and scale provides a unique opportunity for whole-cell modeling. Previous work established genome-scale gene essentiality and proteomics data along with its essential metabolic network and a kinetic model of genetic information processing. In addition to that information, whole-cell, spatially-resolved kinetic models require cellular architecture, including spatial distributions of ribosomes and the circular chromosome's configuration. We reconstruct cellular architectures of Syn3A cells at the single-cell level directly from cryo-electron tomograms, including the ribosome distributions. We present a method of generating self-avoiding circular chromosome configurations in a lattice model with a resolution of 11.8 bp per monomer on a 4 nm cubic lattice. Realizations of the chromosome configurations are constrained by the ribosomes and geometry reconstructed from the tomograms and include DNA loops suggested by experimental chromosome conformation capture (3C) maps. Using ensembles of simulated chromosome configurations we predict chromosome contact maps for Syn3A cells at resolutions of 250 bp and greater and compare them to the experimental maps. Additionally, the spatial distributions of ribosomes and the DNA-crowding resulting from the individual chromosome configurations can be used to identify macromolecular structures formed from ribosomes and DNA, such as polysomes and expressomes.