Dataset Information

Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure.

ABSTRACT: In face-to-face communication, audio-visual (AV) stimuli can be fused, combined or perceived as mismatching. While the left superior temporal sulcus (STS) is presumably the locus of AV integration, the process leading to combination is unknown. Based on previous modelling work, we hypothesize that combination results from a complex dynamic originating in a failure to integrate AV inputs, followed by a reconstruction of the most plausible AV sequence. In two different behavioural tasks and one MEG experiment, we observed that combination is more time demanding than fusion. Using time-/source-resolved human MEG analyses with linear and dynamic causal models, we show that both fusion and combination involve early detection of AV incongruence in the STS, whereas combination is further associated with enhanced activity of AV asynchrony-sensitive regions (auditory and inferior frontal cortices). Based on neural signal decoding, we finally show that only combination can be decoded from the IFG activity and that combination is decoded later than fusion in the STS. These results indicate that the AV speech integration outcome primarily depends on whether the STS converges or not onto an existing multimodal syllable representation, and that combination results from subsequent temporal processing, presumably the off-line re-ordering of incongruent AV stimuli.

SUBMITTER: Bouton S

PROVIDER: S-EPMC7583249 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure.

Bouton Sophie S Delgado-Saa Jaime J Olasagasti Itsaso I Giraud Anne-Lise AL

Scientific reports 20201022 1

In face-to-face communication, audio-visual (AV) stimuli can be fused, combined or perceived as mismatching. While the left superior temporal sulcus (STS) is presumably the locus of AV integration, the process leading to combination is unknown. Based on previous modelling work, we hypothesize that combination results from a complex dynamic originating in a failure to integrate AV inputs, followed by a reconstruction of the most plausible AV sequence. In two different behavioural tasks and one ME ...[more]

PMID: 33093570

Similar Datasets

Project description:BackgroundThis paper describes a web based tool that uses a combination of sonification and an animated display to inquire into the SARS-CoV-2 genome. The audio data is generated in real time from a variety of RNA motifs that are known to be important in the functioning of RNA. Additionally, metadata relating to RNA translation and transcription has been used to shape the auditory and visual displays. Together these tools provide a unique approach to further understand the metabolism of the viral RNA genome. This audio provides a further means to represent the function of the RNA in addition to traditional written and visual approaches.ResultsSonification of the SARS-CoV-2 genomic RNA sequence results in a complex auditory stream composed of up to 12 individual audio tracks. Each auditory motive is derived from the actual RNA sequence or from metadata. This approach has been used to represent transcription or translation of the viral RNA genome. The display highlights the real-time interaction of functional RNA elements. The sonification of codons derived from all three reading frames of the viral RNA sequence in combination with sonified metadata provide the framework for this display. Functional RNA motifs such as transcription regulatory sequences and stem loop regions have also been sonified. Using the tool, audio can be generated in real-time from either genomic or sub-genomic representations of the RNA. Given the large size of the viral genome, a collection of interactive buttons has been provided to navigate to regions of interest, such as cleavage regions in the polyprotein, untranslated regions or each gene. These tools are available through an internet browser and the user can interact with the data display in real time.ConclusionThe auditory display in combination with real-time animation of the process of translation and transcription provide a unique insight into the large body of evidence describing the metabolism of the RNA genome. Furthermore, the tool has been used as an algorithmic based audio generator. These audio tracks can be listened to by the general community without reference to the visual display to encourage further inquiry into the science.

Project description:In order to parse the world around us, we must constantly determine which sensory inputs arise from the same physical source and should therefore be perceptually integrated. Temporal coherence between auditory and visual stimuli drives audio-visual (AV) integration, but the role played by AV spatial alignment is less well understood. Here, we manipulated AV spatial alignment and collected electroencephalography (EEG) data while human subjects performed a free-field variant of the "pip and pop" AV search task. In this paradigm, visual search is aided by a spatially uninformative auditory tone, the onsets of which are synchronized to changes in the visual target. In Experiment 1, tones were either spatially aligned or spatially misaligned with the visual display. Regardless of AV spatial alignment, we replicated the key pip and pop result of improved AV search times. Mirroring the behavioral results, we found an enhancement of early event-related potentials (ERPs), particularly the auditory N1 component, in both AV conditions. We demonstrate that both top-down and bottom-up attention contribute to these N1 enhancements. In Experiment 2, we tested whether spatial alignment influences AV integration in a more challenging context with competing multisensory stimuli. An AV foil was added that visually resembled the target and was synchronized to its own stream of synchronous tones. The visual components of the AV target and AV foil occurred in opposite hemifields; the two auditory components were also in opposite hemifields and were either spatially aligned or spatially misaligned with the visual components to which they were synchronized. Search was fastest when the auditory and visual components of the AV target (and the foil) were spatially aligned. Attention modulated ERPs in both spatial conditions, but importantly, the scalp topography of early evoked responses shifted only when stimulus components were spatially aligned, signaling the recruitment of different neural generators likely related to multisensory integration. These results suggest that AV integration depends on AV spatial alignment when stimuli in both modalities compete for selective integration, a common scenario in real-world perception.

Project description:In an ever-changing environment, crossmodal recalibration is crucial to maintain precise and coherent spatial estimates across different sensory modalities. Accordingly, it has been found that perceived auditory space is recalibrated toward vision after consistent exposure to spatially misaligned audio-visual stimuli (VS). While this so-called ventriloquism aftereffect (VAE) yields internal consistency between vision and audition, it does not necessarily lead to consistency between the perceptual representation of space and the actual environment. For this purpose, feedback about the true state of the external world might be necessary. Here, we tested whether the size of the VAE is modulated by external feedback and reward. During adaptation audio-VS with a fixed spatial discrepancy were presented. Participants had to localize the sound and received feedback about the magnitude of their localization error. In half of the sessions the feedback was based on the position of the VS and in the other half it was based on the position of the auditory stimulus. An additional monetary reward was given if the localization error fell below a certain threshold that was based on participants' performance in the pretest. As expected, when error feedback was based on the position of the VS, auditory localization during adaptation trials shifted toward the position of the VS. Conversely, feedback based on the position of the auditory stimuli reduced the visual influence on auditory localization (i.e., the ventriloquism effect) and improved sound localization accuracy. After adaptation with error feedback based on the VS position, a typical auditory VAE (but no visual aftereffect) was observed in subsequent unimodal localization tests. By contrast, when feedback was based on the position of the auditory stimuli during adaptation, no auditory VAE was observed in subsequent unimodal auditory trials. Importantly, in this situation no visual aftereffect was found either. As feedback did not change the physical attributes of the audio-visual stimulation during adaptation, the present findings suggest that crossmodal recalibration is subject to top-down influences. Such top-down influences might help prevent miscalibration of audition toward conflicting visual stimulation in situations in which external feedback indicates that visual information is inaccurate.

Dataset Information

Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure.

Publications

Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets