Browse
Submit Data
Databases
API
Help

Dataset Information

22 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Use of superordinate labels yields more robust and human-like visual representations in convolutional neural networks.

ABSTRACT: Human visual recognition is outstandingly robust. People can recognize thousands of object classes in the blink of an eye (50-200 ms) even when the objects vary in position, scale, viewpoint, and illumination. What aspects of human category learning facilitate the extraction of invariant visual features for object recognition? Here, we explore the possibility that a contributing factor to learning such robust visual representations may be a taxonomic hierarchy communicated in part by common labels to which people are exposed as part of natural language. We did this by manipulating the taxonomic level of labels (e.g., superordinate-level [mammal, fruit, vehicle] and basic-level [dog, banana, van]), and the order in which these training labels were used during learning by a Convolutional Neural Network. We found that training the model with hierarchical labels yields visual representations that are more robust to image transformations (e.g., position/scale, illumination, noise, and blur), especially when images were first trained with superordinate labels and then fine-tuned with basic labels. We also found that Superordinate-label followed by Basic-label training best predicts functional magnetic resonance imaging responses in visual cortex and behavioral similarity judgments recorded while viewing naturalistic images. The benefits of training with superordinate labels in the earlier stages of category learning is discussed in the context of representational efficiency and generalization.

SUBMITTER: Ahn S

PROVIDER: S-EPMC8727315 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Dissociable Neural Representations of Adversarially Perturbed Images in Convolutional Neural Networks and the Human Brain.

Project description:Despite the remarkable similarities between convolutional neural networks (CNN) and the human brain, CNNs still fall behind humans in many visual tasks, indicating that there still exist considerable differences between the two systems. Here, we leverage adversarial noise (AN) and adversarial interference (AI) images to quantify the consistency between neural representations and perceptual outcomes in the two systems. Humans can successfully recognize AI images as the same categories as their corresponding regular images but perceive AN images as meaningless noise. In contrast, CNNs can recognize AN images similar as corresponding regular images but classify AI images into wrong categories with surprisingly high confidence. We use functional magnetic resonance imaging to measure brain activity evoked by regular and adversarial images in the human brain, and compare it to the activity of artificial neurons in a prototypical CNN-AlexNet. In the human brain, we find that the representational similarity between regular and adversarial images largely echoes their perceptual similarity in all early visual areas. In AlexNet, however, the neural representations of adversarial images are inconsistent with network outputs in all intermediate processing layers, providing no neural foundations for the similarities at the perceptual level. Furthermore, we show that voxel-encoding models trained on regular images can successfully generalize to the neural responses to AI images but not AN images. These remarkable differences between the human brain and AlexNet in representation-perception association suggest that future CNNs should emulate both behavior and the internal neural presentations of the human brain.

| S-EPMC8375771 | biostudies-literature

Unraveling neural coding of dynamic natural visual scenes via convolutional recurrent neural networks.

Project description:Traditional models of retinal system identification analyze the neural response to artificial stimuli using models consisting of predefined components. The model design is limited to prior knowledge, and the artificial stimuli are too simple to be compared with stimuli processed by the retina. To fill in this gap with an explainable model that reveals how a population of neurons work together to encode the larger field of natural scenes, here we used a deep-learning model for identifying the computational elements of the retinal circuit that contribute to learning the dynamics of natural scenes. Experimental results verify that the recurrent connection plays a key role in encoding complex dynamic visual scenes while learning biological computational underpinnings of the retinal circuit. In addition, the proposed models reveal both the shapes and the locations of the spatiotemporal receptive fields of ganglion cells.

| S-EPMC8515013 | biostudies-literature

Orthogonal representations for robust context-dependent task performance in brains and neural networks.

Project description:How do neural populations code for multiple, potentially conflicting tasks? Here we used computational simulations involving neural networks to define "lazy" and "rich" coding solutions to this context-dependent decision-making problem, which trade off learning speed for robustness. During lazy learning the input dimensionality is expanded by random projections to the network hidden layer, whereas in rich learning hidden units acquire structured representations that privilege relevant over irrelevant features. For context-dependent decision-making, one rich solution is to project task representations onto low-dimensional and orthogonal manifolds. Using behavioral testing and neuroimaging in humans and analysis of neural signals from macaque prefrontal cortex, we report evidence for neural coding patterns in biological brains whose dimensionality and neural geometry are consistent with the rich learning regime.

| S-EPMC8992799 | biostudies-literature

Neural representations of the perception of handwritten digits and visual objects from a convolutional neural network compared to humans.

Project description:We investigated neural representations for visual perception of 10 handwritten digits and six visual objects from a convolutional neural network (CNN) and humans using functional magnetic resonance imaging (fMRI). Once our CNN model was fine-tuned using a pre-trained VGG16 model to recognize the visual stimuli from the digit and object categories, representational similarity analysis (RSA) was conducted using neural activations from fMRI and feature representations from the CNN model across all 16 classes. The encoded neural representation of the CNN model exhibited the hierarchical topography mapping of the human visual system. The feature representations in the lower convolutional (Conv) layers showed greater similarity with the neural representations in the early visual areas and parietal cortices, including the posterior cingulate cortex. The feature representations in the higher Conv layers were encoded in the higher-order visual areas, including the ventral/medial/dorsal stream and middle temporal complex. The neural representations in the classification layers were observed mainly in the ventral stream visual cortex (including the inferior temporal cortex), superior parietal cortex, and prefrontal cortex. There was a surprising similarity between the neural representations from the CNN model and the neural representations for human visual perception in the context of the perception of digits versus objects, particularly in the primary visual and associated areas. This study also illustrates the uniqueness of human visual perception. Unlike the CNN model, the neural representation of digits and objects for humans is more widely distributed across the whole brain, including the frontal and temporal areas.

| S-EPMC9980894 | biostudies-literature

An Accurate and Robust Method for Spike Sorting Based on Convolutional Neural Networks.

Project description:In the fields of neuroscience and biomedical signal processing, spike sorting is a crucial step to extract the information of single neurons from extracellular recordings. In this paper, we propose a novel deep learning approach based on one-dimensional convolutional neural networks (1D-CNNs) to implement accurate and robust spike sorting. The results of the simulated data demonstrated that the clustering accuracy in most datasets was greater than 99%, despite the multiple levels of noise and various degrees of overlapped spikes. Moreover, the proposed method performed significantly better than the state-of-the-art method named "WMsorting" and a deep-learning-based multilayer perceptron (MLP) model. In addition, the experimental data recorded from the primary visual cortex of a macaque monkey were used to evaluate the proposed method in a practical application. It was shown that the method could successfully isolate most spikes of different neurons (ranging from two to five) by training the 1D-CNN model with a small number of manually labeled spikes. Considering the above, the deep learning method proposed in this paper is of great advantage for spike sorting with high accuracy and strong robustness. It lays the foundation for application in more challenging works, such as distinguishing overlapped spikes and the simultaneous sorting of multichannel recordings.

| S-EPMC7696441 | biostudies-literature

Robust and efficient representations of dynamic stimuli in hierarchical neural networks via temporal smoothing.

Project description:IntroductionEfficient coding that minimizes informational redundancy of neural representations is a widely accepted neural coding principle. Despite the benefit, maximizing efficiency in neural coding can make neural representation vulnerable to random noise. One way to achieve robustness against random noise is smoothening neural responses. However, it is not clear whether the smoothness of neural responses can hold robust neural representations when dynamic stimuli are processed through a hierarchical brain structure, in which not only random noise but also systematic error due to temporal lag can be induced.MethodsIn the present study, we showed that smoothness via spatio-temporally efficient coding can achieve both efficiency and robustness by effectively dealing with noise and neural delay in the visual hierarchy when processing dynamic visual stimuli.ResultsThe simulation results demonstrated that a hierarchical neural network whose bidirectional synaptic connections were learned through spatio-temporally efficient coding with natural scenes could elicit neural responses to visual moving bars similar to those to static bars with the identical position and orientation, indicating robust neural responses against erroneous neural information. It implies that spatio-temporally efficient coding preserves the structure of visual environments locally in the neural responses of hierarchical structures.DiscussionThe present results suggest the importance of a balance between efficiency and robustness in neural coding for visual processing of dynamic stimuli across hierarchical brain structures.

| S-EPMC10307978 | biostudies-literature

Limits to visual representational correspondence between convolutional neural networks and the human brain.

Project description:Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. Despite the presence of some CNN-brain correspondence and CNNs' impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations. The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information.

| S-EPMC8024324 | biostudies-literature

Guiding visual attention in deep convolutional neural networks based on human eye movements.

Project description:Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision, have evolved into best current computational models of object recognition, and consequently indicate strong architectural and functional parallelism with the ventral visual pathway throughout comparisons with neuroimaging and neural time series data. As recent advances in deep learning seem to decrease this similarity, computational neuroscience is challenged to reverse-engineer the biological plausibility to obtain useful models. While previous studies have shown that biologically inspired architectures are able to amplify the human-likeness of the models, in this study, we investigate a purely data-driven approach. We use human eye tracking data to directly modify training examples and thereby guide the models' visual attention during object recognition in natural images either toward or away from the focus of human fixations. We compare and validate different manipulation types (i.e., standard, human-like, and non-human-like attention) through GradCAM saliency maps against human participant eye tracking data. Our results demonstrate that the proposed guided focus manipulation works as intended in the negative direction and non-human-like models focus on significantly dissimilar image parts compared to humans. The observed effects were highly category-specific, enhanced by animacy and face presence, developed only after feedforward processing was completed, and indicated a strong influence on face detection. With this approach, however, no significantly increased human-likeness was found. Possible applications of overt visual attention in DCNNs and further implications for theories of face detection are discussed.

| S-EPMC9514055 | biostudies-literature

Functional dysconnectivity of visual and somatomotor networks yields a simple and robust biomarker for psychosis.

Project description:People with psychosis exhibit thalamo-cortical hyperconnectivity and cortico-cortical hypoconnectivity with sensory networks, however, it remains unclear if this applies to all sensory networks, whether it arises from other illness factors, or whether such differences could form the basis of a viable biomarker. To address the foregoing, we harnessed data from the Human Connectome Early Psychosis Project and computed resting-state functional connectivity (RSFC) matrices for 54 healthy controls and 105 psychosis patients. Primary visual, secondary visual ("visual2"), auditory, and somatomotor networks were defined via a recent brain network partition. RSFC was determined for 718 regions via regularized partial correlation. Psychosis patients-both affective and non-affective-exhibited cortico-cortical hypoconnectivity and thalamo-cortical hyperconnectivity in somatomotor and visual2 networks but not in auditory or primary visual networks. When we averaged the visual2 and somatomotor network connections and subtracted the thalamo-cortical and cortico-cortical connectivity values, a robust psychosis biomarker emerged (p=2e-10, Hedges' g=1.05). This "somato-visual" biomarker was present in antipsychotic-naive patients and did not depend on confounds such as psychiatric comorbidities, substance/nicotine use, stress, or anxiety. It had moderate test-retest reliability (ICC=.61) and could be recovered in five-minute scans. The marker could discriminate groups in leave-one-site-out cross-validation (AUC=.79) and improve group classification upon being added to a well-known neurocognition task. Finally, it could differentiate later-stage psychosis patients from healthy or ADHD controls in two independent data sets. These results introduce a simple and robust RSFC biomarker that can distinguish psychosis patients from controls by the early illness stages.

| S-EPMC11213076 | biostudies-literature

Robust and Interpretable Convolutional Neural Networks to Detect Glaucoma in Optical Coherence Tomography Images.

Project description:Recent studies suggest that deep learning systems can now achieve performance on par with medical experts in diagnosis of disease. A prime example is in the field of ophthalmology, where convolutional neural networks (CNNs) have been used to detect retinal and ocular diseases. However, this type of artificial intelligence (AI) has yet to be adopted clinically due to questions regarding robustness of the algorithms to datasets collected at new clinical sites and a lack of explainability of AI-based predictions, especially relative to those of human expert counterparts. In this work, we develop CNN architectures that demonstrate robust detection of glaucoma in optical coherence tomography (OCT) images and test with concept activation vectors (TCAVs) to infer what image concepts CNNs use to generate predictions. Furthermore, we compare TCAV results to eye fixations of clinicians, to identify common decision-making features used by both AI and human experts. We find that employing fine-tuned transfer learning and CNN ensemble learning create end-to-end deep learning models with superior robustness compared to previously reported hybrid deep-learning/machine-learning models, and TCAV/eye-fixation comparison suggests the importance of three OCT report sub-images that are consistent with areas of interest fixated upon by OCT experts to detect glaucoma. The pipeline described here for evaluating CNN robustness and validating interpretable image concepts used by CNNs with eye movements of experts has the potential to help standardize the acceptance of new AI tools for use in the clinic.

| S-EPMC8397372 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data