Dataset Information

Modeling Image Patches with a Generic Dictionary of Mini-Epitomes.

ABSTRACT: The goal of this paper is to question the necessity of features like SIFT in categorical visual recognition tasks. As an alternative, we develop a generative model for the raw intensity of image patches and show that it can support image classification performance on par with optimized SIFT-based techniques in a bag-of-visual-words setting. Key ingredient of the proposed model is a compact dictionary of mini-epitomes, learned in an unsupervised fashion on a large collection of images. The use of epitomes allows us to explicitly account for photometric and position variability in image appearance. We show that this flexibility considerably increases the capacity of the dictionary to accurately approximate the appearance of image patches and support recognition tasks. For image classification, we develop histogram-based image encoding methods tailored to the epitomic representation, as well as an "epitomic footprint" encoding which is easy to visualize and highlights the generative nature of our model. We discuss in detail computational aspects and develop efficient algorithms to make the model scalable to large tasks. The proposed techniques are evaluated with experiments on the challenging PASCAL VOC 2007 image classification benchmark.

SUBMITTER: Papandreou G

PROVIDER: S-EPMC4550088 | biostudies-literature | 2014 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Modeling Image Patches with a Generic Dictionary of Mini-Epitomes.

Papandreou George G Chen Liang-Chieh LC Yuille Alan L AL

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 20140601

The goal of this paper is to question the necessity of features like SIFT in categorical visual recognition tasks. As an alternative, we develop a generative model for the raw intensity of image patches and show that it can support image classification performance on par with optimized SIFT-based techniques in a bag-of-visual-words setting. Key ingredient of the proposed model is a compact dictionary of mini-epitomes, learned in an unsupervised fashion on a large collection of images. The use of ...[more]

PMID: 26321859

Similar Datasets

Project description:Building automated cancer screening systems based on image analysis is currently a hot topic in computer vision and medical imaging community. One of the biggest challenges of such systems, especially those using state-of-the-art deep learning techniques, is that they usually require a large amount of training data to be accurate. However, in the medical field, the confidentiality of the data and the need for medical expertise to label them significantly reduce the amount of training data available. A common practice to overcome this problem is to apply data set augmentation techniques to artificially increase the size of the training data set. Classical data set augmentation methods such as geometrical or color transformations are efficient but still produce a limited amount of new data. Hence, there has been interest in data set augmentation methods using generative models able to synthesize a wider variety of new data. VitaDX is actually developing an automated bladder cancer screening system based on the analysis of cell images contained in urinary cytology digital slides. Currently, the number of available labeled cell images is limited and therefore exploitation of the full potential of deep learning techniques is not possible. In an attempt to increase the number of labeled cell images, a new generic generator for 2D cell images has been developed and is described in this article. This framework combines previous works on cell image generation and a recent style transfer method referred to as doodle-style transfer in this article. To the best of our knowledge, we are the first to use a doodle-style transfer method for synthetic cell image generation. This framework is quite modular and could be applied to other cell image generation problems. A statistical evaluation has shown that features of real and synthetic cell images followed roughly the same distribution. Finally, the realism of the synthetic cell images has been assessed through a visual evaluation performed with the help of medical experts. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.

Project description:Despite the rapid developments of X-ray cone-beam CT (CBCT), image noise still remains a major issue for the low dose CBCT. To suppress the noise effectively while retain the structures well for low dose CBCT image, in this paper, a sparse constraint based on the 3-D dictionary is incorporated into a regularized iterative reconstruction framework, defining the 3-D dictionary learning (3-DDL) method. In addition, by analyzing the sparsity level curve associated with different regularization parameters, a new adaptive parameter selection strategy is proposed to facilitate our 3-DDL method. To justify the proposed method, we first analyze the distributions of the representation coefficients associated with the 3-D dictionary and the conventional 2-D dictionary to compare their efficiencies in representing volumetric images. Then, multiple real data experiments are conducted for performance validation. Based on these results, we found: 1) the 3-D dictionary-based sparse coefficients have three orders narrower Laplacian distribution compared with the 2-D dictionary, suggesting the higher representation efficiencies of the 3-D dictionary; 2) the sparsity level curve demonstrates a clear Z-shape, and hence referred to as Z-curve, in this paper; 3) the parameter associated with the maximum curvature point of the Z-curve suggests a nice parameter choice, which could be adaptively located with the proposed Z-index parameterization (ZIP) method; 4) the proposed 3-DDL algorithm equipped with the ZIP method could deliver reconstructions with the lowest root mean squared errors and the highest structural similarity index compared with the competing methods; 5) similar noise performance as the regular dose FDK reconstruction regarding the standard deviation metric could be achieved with the proposed method using (1/2)/(1/4)/(1/8) dose level projections. The contrast-noise ratio is improved by ~2.5/3.5 times with respect to two different cases under the (1/8) dose level compared with the low dose FDK reconstruction. The proposed method is expected to reduce the radiation dose by a factor of 8 for CBCT, considering the voted strongly discriminated low contrast tissues.

Project description:Neurons in visual area V4 modulate their responses depending on the figure-ground (FG) organization in natural images containing a variety of shapes and textures. To clarify whether the responses depend on the extents of the figure and ground regions in and around the classical receptive fields (CRFs) of the neurons, we estimated the spatial extent of local figure and ground regions that evoked FG-dependent responses (RF-FGs) in natural images and their variants. Specifically, we applied the framework of spike triggered averaging (STA) to the combinations of neural responses and human-marked segmentation images (FG labels) that represent the extents of the figure and ground regions in the corresponding natural image stimuli. FG labels were weighted by the spike counts in response to the corresponding stimuli and averaged over. The bias due to the nonuniformity of FG labels was compensated by subtracting the ensemble average of FG labels from the weighted average. Approximately 50% of the neurons showed effective RF-FGs, and a large number exhibited structures that were similar to those observed in virtual neurons with ideal FG-dependent responses. The structures of the RF-FGs exhibited a subregion responsive to a preferred side (figure or ground) around the CRF center and a subregion responsive to a non-preferred side in the surroundings. The extents of the subregions responsive to figure were smaller than those responsive to ground in agreement with the Gestalt rule. We also estimated RF-FG by an adaptive filtering (AF) method, which does not require spherical symmetry (whiteness) in stimuli. RF-FGs estimated by AF and STA exhibited similar structures, supporting the veridicality of the proposed STA. To estimate the contribution of nonlinear processing in addition to linear processing, we estimated nonlinear RF-FGs based on the framework of spike triggered covariance (STC). The analyses of the models based on STA and STC did not show inconsiderable contribution of nonlinearity, suggesting spatial variance of FG regions. The results lead to an understanding of the neural responses that underlie the segregation of figures and the construction of surfaces in intermediate-level visual areas.

Dataset Information

Modeling Image Patches with a Generic Dictionary of Mini-Epitomes.

Publications

Modeling Image Patches with a Generic Dictionary of Mini-Epitomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets