Modulating Cortical Instrument Representations During Auditory Stream Segregation and Integration With Polyphonic Music.
Ontology highlight
ABSTRACT: Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
Project description:Polyphonic music listening well exemplifies processes typically involved in daily auditory scene analysis situations, relying on an interactive interplay between bottom-up and top-down processes. Most studies investigating scene analysis have used elementary auditory scenes, however real-world scene analysis is far more complex. In particular, music, contrary to most other natural auditory scenes, can be perceived by either integrating or, under attentive control, segregating sound streams, often carried by different instruments. One of the prominent bottom-up cues contributing to multi-instrument music perception is their timbre difference. In this work, we introduce and validate a novel paradigm designed to investigate, within naturalistic musical auditory scenes, attentive modulation as well as its interaction with bottom-up processes. Two psychophysical experiments are described, employing custom-composed two-voice polyphonic music pieces within a framework implementing a behavioral performance metric to validate listener instructions requiring either integration or segregation of scene elements. In Experiment 1, the listeners' locus of attention was switched between individual instruments or the aggregate (i.e., both instruments together), via a task requiring the detection of temporal modulations (i.e., triplets) incorporated within or across instruments. Subjects responded post-stimulus whether triplets were present in the to-be-attended instrument(s). Experiment 2 introduced the bottom-up manipulation by adding a three-level morphing of instrument timbre distance to the attentional framework. The task was designed to be used within neuroimaging paradigms; Experiment 2 was additionally validated behaviorally in the functional Magnetic Resonance Imaging (fMRI) environment. Experiment 1 subjects (N = 29, non-musicians) completed the task at high levels of accuracy, showing no group differences between any experimental conditions. Nineteen listeners also participated in Experiment 2, showing a main effect of instrument timbre distance, even though within attention-condition timbre-distance contrasts did not demonstrate any timbre effect. Correlation of overall scores with morph-distance effects, computed by subtracting the largest from the smallest timbre distance scores, showed an influence of general task difficulty on the timbre distance effect. Comparison of laboratory and fMRI data showed scanner noise had no adverse effect on task performance. These Experimental paradigms enable to study both bottom-up and top-down contributions to auditory stream segregation and integration within psychophysical and neuroimaging experiments.
Project description:The purpose of this study was to investigate the roles of spectral overlap and amplitude modulation (AM) rate for stream segregation for noise signals, as well as to test the build-up effect based on these two cues. Segregation ability was evaluated using an objective paradigm with listeners' attention focused on stream segregation. Stimulus sequences consisted of two interleaved sets of bandpass noise bursts (A and B bursts). The A and B bursts differed in spectrum, AM-rate, or both. The amount of the difference between the two sets of noise bursts was varied. Long and short sequences were studied to investigate the build-up effect for segregation based on spectral and AM-rate differences. Results showed the following: (1). Stream segregation ability increased with greater spectral separation. (2). Larger AM-rate separations were associated with stronger segregation abilities. (3). Spectral separation was found to elicit the build-up effect for the range of spectral differences assessed in the current study. (4). AM-rate separation interacted with spectral separation suggesting an additive effect of spectral separation and AM-rate separation on segregation build-up. The findings suggest that, when normal-hearing listeners direct their attention towards segregation, they are able to segregate auditory streams based on reduced spectral contrast cues that vary by the amount of spectral overlap. Further, regardless of the spectral separation they are able to use AM-rate difference as a secondary/weaker cue. Based on the spectral differences, listeners can segregate auditory streams better as the listening duration is prolonged-i.e., sparse spectral cues elicit build-up segregation; however, AM-rate differences only appear to elicit build-up when in combination with spectral difference cues.
Project description:Previous research has shown that musical beat perception is a surprisingly complex phenomenon involving widespread neural coordination across higher-order sensory, motor and cognitive areas. However, the question of how low-level auditory processing must necessarily shape these dynamics, and therefore perception, is not well understood. Here, we present evidence that the auditory cortical representation of music, even in the absence of motor or top-down activations, already favours the beat that will be perceived. Extracellular firing rates in the rat auditory cortex were recorded in response to 20 musical excerpts diverse in tempo and genre, for which musical beat perception had been characterized by the tapping behaviour of 40 human listeners. We found that firing rates in the rat auditory cortex were on average higher on the beat than off the beat. This 'neural emphasis' distinguished the beat that was perceived from other possible interpretations of the beat, was predictive of the degree of tapping consensus across human listeners, and was accounted for by a spectrotemporal receptive field model. These findings strongly suggest that the 'bottom-up' processing of music performed by the auditory system predisposes the timing and clarity of the perceived musical beat.
Project description:Auditory stream segregation is a perceptual process by which the human auditory system groups sounds from different sources into perceptually meaningful elements (e.g., a voice or a melody). The perceptual segregation of sounds is important, for example, for the understanding of speech in noisy scenarios, a particularly challenging task for listeners with a cochlear implant (CI). It has been suggested that some aspects of stream segregation may be explained by relatively basic neural mechanisms at a cortical level. During the past decades, a variety of models have been proposed to account for the data from stream segregation experiments in normal-hearing (NH) listeners. However, little attention has been given to corresponding findings in CI listeners. The present study investigated whether a neural model of sequential stream segregation, proposed to describe the behavioral effects observed in NH listeners, can account for behavioral data from CI listeners. The model operates on the stimulus features at the cortical level and includes a competition stage between the neuronal units encoding the different percepts. The competition arises from a combination of mutual inhibition, adaptation, and additive noise. The model was found to capture the main trends in the behavioral data from CI listeners, such as the larger probability of a segregated percept with increasing the feature difference between the sounds as well as the build-up effect. Importantly, this was achieved without any modification to the model's competition stage, suggesting that stream segregation could be mediated by a similar mechanism in both groups of listeners.
Project description:A data flow is presented for visualising the evolution of elementary structures of polyphonic music from early Baroque to late Romantic, using quasi-phylogenies based on fingerprint diagrams and barcode sequence data of 2-tuples of consecutive vertical pitch class sets (pcs). The present methodological study, which sees itself as a proof of concept for a data-driven approach, uses examples of music from the Baroque, the Viennese School and the Romantic era to show that such quasi-phylogenies can be generated from multi-track MIDI (v. 1) files that largely correspond to the eras and the chronology of compositions and composers. The method presented is considered to have the potential to support the analysis of a wide range of musicological questions. In the context of collaborative work on quasi-phylogenies of polyphonic music, a public data archive could be established that provides multi-track MIDI files with contextual data.
Project description:Musicians often say that they not only hear, but also "feel" music. To explore the contribution of tactile information in "feeling" musical rhythm, we investigated the degree that auditory and tactile inputs are integrated in humans performing a musical meter recognition task. Subjects discriminated between two types of sequences, 'duple' (march-like rhythms) and 'triple' (waltz-like rhythms) presented in three conditions: 1) Unimodal inputs (auditory or tactile alone), 2) Various combinations of bimodal inputs, where sequences were distributed between the auditory and tactile channels such that a single channel did not produce coherent meter percepts, and 3) Simultaneously presented bimodal inputs where the two channels contained congruent or incongruent meter cues. We first show that meter is perceived similarly well (70%-85%) when tactile or auditory cues are presented alone. We next show in the bimodal experiments that auditory and tactile cues are integrated to produce coherent meter percepts. Performance is high (70%-90%) when all of the metrically important notes are assigned to one channel and is reduced to 60% when half of these notes are assigned to one channel. When the important notes are presented simultaneously to both channels, congruent cues enhance meter recognition (90%). Performance drops dramatically when subjects were presented with incongruent auditory cues (10%), as opposed to incongruent tactile cues (60%), demonstrating that auditory input dominates meter perception. We believe that these results are the first demonstration of cross-modal sensory grouping between any two senses.
Project description:Sensory information is represented and elaborated in hierarchical cortical systems that are thought to be dedicated to individual sensory modalities. This traditional view of sensory cortex organization has been challenged by recent evidence of multimodal responses in primary and association sensory areas. Although it is indisputable that sensory areas respond to multiple modalities, it remains unclear whether these multimodal responses reflect selective information processing for particular stimulus features. Here, we used fMRI adaptation to identify brain regions that are sensitive to the temporal frequency information contained in auditory, tactile, and audiotactile stimulus sequences. A number of brain regions distributed over the parietal and temporal lobes exhibited frequency-selective temporal response modulation for both auditory and tactile stimulus events, as indexed by repetition suppression effects. A smaller set of regions responded to crossmodal adaptation sequences in a frequency-dependent manner. Despite an extensive overlap of multimodal frequency-selective responses across the parietal and temporal lobes, representational similarity analysis revealed a cortical "regional landscape" that clearly reflected distinct somatosensory and auditory processing systems that converged on modality-invariant areas. These structured relationships between brain regions were also evident in spontaneous signal fluctuation patterns measured at rest. Our results reveal that multimodal processing in human cortex can be feature-specific and that multimodal frequency representations are embedded in the intrinsically hierarchical organization of cortical sensory systems.
Project description:The role of the spatial separation between the stimulating electrodes (electrode separation) in sequential stream segregation was explored in cochlear implant (CI) listeners using a deviant detection task. Twelve CI listeners were instructed to attend to a series of target sounds in the presence of interleaved distractor sounds. A deviant was randomly introduced in the target stream either at the beginning, middle or end of each trial. The listeners were asked to detect sequences that contained a deviant and to report its location within the trial. The perceptual segregation of the streams should, therefore, improve deviant detection performance. The electrode range for the distractor sounds was varied, resulting in different amounts of overlap between the target and the distractor streams. For the largest electrode separation condition, event-related potentials (ERPs) were recorded under active and passive listening conditions. The listeners were asked to perform the behavioral task for the active listening condition and encouraged to watch a muted movie for the passive listening condition. Deviant detection performance improved with increasing electrode separation between the streams, suggesting that larger electrode differences facilitate the segregation of the streams. Deviant detection performance was best for deviants happening late in the sequence, indicating that a segregated percept builds up over time. The analysis of the ERP waveforms revealed that auditory selective attention modulates the ERP responses in CI listeners. Specifically, the responses to the target stream were, overall, larger in the active relative to the passive listening condition. Conversely, the ERP responses to the distractor stream were not affected by selective attention. However, no significant correlation was observed between the behavioral performance and the amount of attentional modulation. Overall, the findings from the present study suggest that CI listeners can use electrode separation to perceptually group sequential sounds. Moreover, selective attention can be deployed on the resulting auditory objects, as reflected by the attentional modulation of the ERPs at the group level.
Project description:Spatial hearing is widely regarded as helpful in recognizing a sound amid other competing sounds. It is a matter of debate, however, whether spatial cues contribute to "stream segregation," which refers to the specific task of assigning multiple interleaved sequences of sounds to their respective sources. The present study employed "rhythmic masking release" as a measure of the spatial acuity of stream segregation. Listeners discriminated between rhythms of noise-burst sequences presented from free-field targets in the presence of interleaved maskers that varied in location. For broadband sounds in the horizontal plane, target-masker separations of ≥8° permitted rhythm discrimination with d' ≥ 1; in some cases, such thresholds approached listeners' minimum audible angles. Thresholds were the same for low-frequency sounds but were substantially wider for high-frequency sounds, suggesting that interaural delays provided higher spatial acuity in this task than did interaural level differences. In the vertical midline, performance varied dramatically as a function of noise-burst duration with median thresholds ranging from >30° for 10-ms bursts to 7.1° for 40-ms bursts. A marked dissociation between minimum audible angles and masking release thresholds across the various pass-band and burst-duration conditions suggests that location discrimination and spatial stream segregation are mediated by distinct auditory mechanisms.
Project description:Recent studies have challenged the traditional notion of modality-dedicated cortical systems by showing that audition and touch evoke responses in the same sensory brain regions. While much of this work has focused on somatosensory responses in auditory regions, fewer studies have investigated sound responses and representations in somatosensory regions. In this functional magnetic resonance imaging (fMRI) study, we measured BOLD signal changes in participants performing an auditory frequency discrimination task and characterized activation patterns related to stimulus frequency using both univariate and multivariate analysis approaches. Outside of bilateral temporal lobe regions, we observed robust and frequency-specific responses to auditory stimulation in classically defined somatosensory areas. Moreover, using representational similarity analysis to define the relationships between multi-voxel activation patterns for all sound pairs, we found clear similarity patterns for auditory responses in the parietal lobe that correlated significantly with perceptual similarity judgments. Our results demonstrate that auditory frequency representations can be distributed over brain regions traditionally considered to be dedicated to somatosensation. The broad distribution of auditory and tactile responses over parietal and temporal regions reveals a number of candidate brain areas that could support general temporal frequency processing and mediate the extensive and robust perceptual interactions between audition and touch.