Dataset Information

Visual Saliency Models for Text Detection in Real World.

ABSTRACT: This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti's visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti's model and consists of two stages. In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti's model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts.

SUBMITTER: Gao R

PROVIDER: S-EPMC4262416 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Visual Saliency Models for Text Detection in Real World.

Gao Renwu R Uchida Seiichi S Shahab Asif A Shafait Faisal F Frinken Volkmar V

PloS one 20141210 12

This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency mode ...[more]

PMID: 25494196

Similar Datasets

Project description:Visual neural plasticity and V1 saliency-detection are vital for efficient-coding of dynamically changing visual inputs. However, how does neural plasticity contribute to saliency-detection of temporal-statistically distributed visual stream remains unclear. Therefore, we adopted randomly presented but unevenly distributed stimuli with multiple orientations, and examined the single-unit responses evoked by this biased orientation-adaptation protocol, by single-unit recordings in the visual thalamo-ventral pathway of cats (of either sex). We found neuronal responses potentiated when the probability of biased orientation was slightly higher than other non-biased ones, and suppressed when the probability became much higher. This single-neuronal short-term bidirectional-plasticity is selectively induced by optimal stimuli, but inter-ocularly transferable. It is inducible in LGN, Area 17 and Area 21a with distinct and hierarchically progressive patterns. With the results of latency-analysis, receptive-field structural test, cortical lesion and simulations, we suggest this bidirectional-plasticity may principally originate from the adaptation-competition between excitatory and inhibitory components of V1 neuronal receptive-field. In our simulation, above bidirectional-plasticity could achieve saliency-detection of dynamic visual inputs. These findings demonstrate a rapid probability-dependant plasticity on the neural coding of visual stream, and suggest its functional role in the efficient-coding and saliency-detection of dynamic environment.SIGNIFICANCE STATEMENTNovel elements within a dynamic visual stream can "pop-up" from the context, vital for rapid response to dynamically changing world. "Saliency-detection" is a promising bottom-up mechanism contributing to efficient selection of visual inputs, wherein visual adaptation also plays a significant role. However, the saliency-detection of dynamic visual stream is poorly understood. Here we found a novel form of visual short-term bidirectional-plasticity in multi-stages of visual system that contributes to saliency-detection of dynamic visual inputs. This bidirectional-plasticity may principally originate from the local balance of excitation-inhibition in primary visual cortex, and propagates to lower and higher visual areas with progressive pattern-change. Our findings suggest the excitation-inhibition balance within visual system contributing to visual efficient-coding.

Project description:BackgroundDuring visual exploration or free-view, gaze positioning is largely determined by the tendency to maximize visual saliency: more salient locations are more likely to be fixated. However, when visual input is completely irrelevant for performance, such as with non-visual tasks, this saliency maximization strategy may be less advantageous and potentially even disruptive for task-performance. Here, we examined whether visual saliency remains a strong driving force in determining gaze positions even in non-visual tasks. We tested three alternative hypotheses: a) That saliency is disadvantageous for non-visual tasks and therefore gaze would tend to shift away from it and towards non-salient locations; b) That saliency is irrelevant during non-visual tasks and therefore gaze would not be directed towards it but also not away-from it; c) That saliency maximization is a strong behavioral drive that would prevail even during non-visual tasks.MethodsGaze position was monitored as participants performed visual or non-visual tasks while they were presented with complex or simple images. The effect of attentional demands was examined by comparing an easy non-visual task with a more difficult one.ResultsExploratory behavior was evident, regardless of task difficulty, even when the task was non-visual and the visual input was entirely irrelevant. The observed exploratory behaviors included a strong tendency to fixate salient locations, central fixation bias and a gradual reduction in saliency for later fixations. These exploratory behaviors were spatially similar to those of an explicit visual exploration task but they were, nevertheless, attenuated. Temporal differences were also found: in the non-visual task there were longer fixations and later first fixations than in the visual task, reflecting slower visual sampling in this task.ConclusionWe conclude that in the presence of a rich visual environment, visual exploration is evident even when there is no explicit instruction to explore. Compared to visually motivated tasks, exploration in non-visual tasks follows similar selection mechanisms, but occurs at a lower rate. This is consistent with the view that the non-visual task is the equivalent of a dual-task: it combines the instructed task with an uninstructed, perhaps even mandatory, exploratory behavior.

Project description:Under fast viewing conditions, the visual system extracts salient and simplified representations of complex visual scenes. Saccadic eye movements optimize such visual analysis through the dynamic sampling of the most informative and salient regions in the scene. However, a general definition of saliency, as well as its role for natural active vision, is still a matter for discussion. Following the general idea that visual saliency may be based on the amount of local information, a recent constrained maximum-entropy model of early vision, applied to natural images, extracts a set of local optimal information-carriers, as candidate salient features. These optimal features proved to be more informative than others in fast vision, when embedded in simplified sketches of natural images. In the present study, for the first time, these features were presented in isolation, to investigate whether they can be visually more salient than other non-optimal features, even in the absence of any meaningful global arrangement (contour, line, etc.). In four psychophysics experiments, fast discriminability of a compound of optimal features (target) in comparison with a similar compound of non-optimal features (distractor) was measured as a function of their number and contrast. Results showed that the saliency predictions from the constrained maximum-entropy model are well verified in the data, even when the optimal features are presented in smaller numbers or at lower contrast. In the eye movements experiment, the target and the distractor compounds were presented in the periphery at different angles. Participants were asked to perform a simple choice-saccade task. Results showed that saccades can select informative optimal features spatially interleaved with non-optimal features even at the shortest latencies. Saccades' choice accuracy and landing position precision improved with SNR. In conclusion, the optimal features predicted by the reference model, turn out to be more salient than others, despite the lack of any clues coming from a global meaningful structure, suggesting that they get preferential treatment during fast image analysis. Also, peripheral fast visual processing of these informative local features is able to guide gaze orientation. We speculate that active vision is efficiently adapted to maximize information in natural visual scenes.

Dataset Information

Visual Saliency Models for Text Detection in Real World.

Publications

Visual Saliency Models for Text Detection in Real World.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets