Dataset Information

Weighting of Prosodic and Lexical-Semantic Cues for Emotion Identification in Spectrally Degraded Speech and With Cochlear Implants.

ABSTRACT:

Objectives

Normally-hearing (NH) listeners rely more on prosodic cues than on lexical-semantic cues for emotion perception in speech. In everyday spoken communication, the ability to decipher conflicting information between prosodic and lexical-semantic cues to emotion can be important: for example, in identifying sarcasm or irony. Speech degradation in cochlear implants (CIs) can be sufficiently overcome to identify lexical-semantic cues, but the distortion of voice pitch cues makes it particularly challenging to hear prosody with CIs. The purpose of this study was to examine changes in relative reliance on prosodic and lexical-semantic cues in NH adults listening to spectrally degraded speech and adult CI users. We hypothesized that, compared with NH counterparts, CI users would show increased reliance on lexical-semantic cues and reduced reliance on prosodic cues for emotion perception. We predicted that NH listeners would show a similar pattern when listening to CI-simulated versions of emotional speech.

Design

Sixteen NH adults and 8 postlingually deafened adult CI users participated in the study. Sentences were created to convey five lexical-semantic emotions (angry, happy, neutral, sad, and scared), with five sentences expressing each category of emotion. Each of these 25 sentences was then recorded with the 5 (angry, happy, neutral, sad, and scared) prosodic emotions by 2 adult female talkers. The resulting stimulus set included 125 recordings (25 Sentences × 5 Prosodic Emotions) per talker, of which 25 were congruent (consistent lexical-semantic and prosodic cues to emotion) and the remaining 100 were incongruent (conflicting lexical-semantic and prosodic cues to emotion). The recordings were processed to have 3 levels of spectral degradation: full-spectrum, CI-simulated (noise-vocoded) to have 8 channels and 16 channels of spectral information, respectively. Twenty-five recordings (one sentence per lexical-semantic emotion recorded in all five prosodies) were used for a practice run in the full-spectrum condition. The remaining 100 recordings were used as test stimuli. For each talker and condition of spectral degradation, listeners indicated the emotion associated with each recording in a single-interval, five-alternative forced-choice task. The responses were scored as proportion correct, where "correct" responses corresponded to the lexical-semantic emotion. CI users heard only the full-spectrum condition.

Results

The results showed a significant interaction between hearing status (NH, CI) and congruency in identifying the lexical-semantic emotion associated with the stimuli. This interaction was as predicted, that is, CI users showed increased reliance on lexical-semantic cues in the incongruent conditions, while NH listeners showed increased reliance on the prosodic cues in the incongruent conditions. As predicted, NH listeners showed increased reliance on lexical-semantic cues to emotion when the stimuli were spectrally degraded.

Conclusions

The present study confirmed previous findings of prosodic dominance for emotion perception by NH listeners in the full-spectrum condition. Further, novel findings with CI patients and NH listeners in the CI-simulated conditions showed reduced reliance on prosodic cues and increased reliance on lexical-semantic cues to emotion. These results have implications for CI listeners' ability to perceive conflicts between prosodic and lexical-semantic cues, with repercussions for their identification of sarcasm and humor. Understanding instances of sarcasm or humor can impact a person's ability to develop relationships, follow conversation, understand vocal emotion and intended message of a speaker, following jokes, and everyday communication in general.

SUBMITTER: Richter ME

PROVIDER: S-EPMC8545870 | biostudies-literature | 2021 Nov-Dec 01

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Weighting of Prosodic and Lexical-Semantic Cues for Emotion Identification in Spectrally Degraded Speech and With Cochlear Implants.

Richter Margaret E ME Chatterjee Monita M

Ear and hearing 20211101 6

<h4>Objectives</h4>Normally-hearing (NH) listeners rely more on prosodic cues than on lexical-semantic cues for emotion perception in speech. In everyday spoken communication, the ability to decipher conflicting information between prosodic and lexical-semantic cues to emotion can be important: for example, in identifying sarcasm or irony. Speech degradation in cochlear implants (CIs) can be sufficiently overcome to identify lexical-semantic cues, but the distortion of voice pitch cues makes it ...[more]

PMID: 34294630

Similar Datasets

Project description:IntroductionCochlear implants (CIs) are the treatment of choice for severe to profound hearing loss. Variability in CI outcomes remains despite advances in technology and is attributed in part to differences in cortical processing. Studying these differences in CI users is technically challenging. Spectrally degraded stimuli presented to normal-hearing individuals approximate input to the central auditory system in CI users. This study used intracranial electroencephalography (iEEG) to investigate cortical processing of spectrally degraded speech.MethodsParticipants were adult neurosurgical epilepsy patients. Stimuli were utterances /aba/ and /ada/, spectrally degraded using a noise vocoder (1-4 bands) or presented without vocoding. The stimuli were presented in a two-alternative forced choice task. Cortical activity was recorded using depth and subdural iEEG electrodes. Electrode coverage included auditory core in posteromedial Heschl's gyrus (HGPM), superior temporal gyrus (STG), ventral and dorsal auditory-related areas, and prefrontal and sensorimotor cortex. Analysis focused on high gamma (70-150 Hz) power augmentation and alpha (8-14 Hz) suppression.ResultsChance task performance occurred with 1-2 spectral bands and was near-ceiling for clear stimuli. Performance was variable with 3-4 bands, permitting identification of good and poor performers. There was no relationship between task performance and participants demographic, audiometric, neuropsychological, or clinical profiles. Several response patterns were identified based on magnitude and differences between stimulus conditions. HGPM responded strongly to all stimuli. A preference for clear speech emerged within non-core auditory cortex. Good performers typically had strong responses to all stimuli along the dorsal stream, including posterior STG, supramarginal, and precentral gyrus; a minority of sites in STG and supramarginal gyrus had a preference for vocoded stimuli. In poor performers, responses were typically restricted to clear speech. Alpha suppression was more pronounced in good performers. In contrast, poor performers exhibited a greater involvement of posterior middle temporal gyrus when listening to clear speech.DiscussionResponses to noise-vocoded speech provide insights into potential factors underlying CI outcome variability. The results emphasize differences in the balance of neural processing along the dorsal and ventral stream between good and poor performers, identify specific cortical regions that may have diagnostic and prognostic utility, and suggest potential targets for neuromodulation-based CI rehabilitation strategies.

Project description:In the clinical fitting of cochlear implants (CIs), the lowest input acoustic frequency is typically much lower than the characteristic frequency associated with the most apical electrode position, due to the limited electrode insertion depth. For bilateral CI users, electrode positions may differ across ears. However, the same acoustic-to-electrode frequency allocation table (FAT) is typically assigned to both ears. As such, bilateral CI users may experience both intra-aural frequency mismatch within each ear and inter-aural mismatch across ears. This inter-aural mismatch may limit the ability of bilateral CI users to take advantage of spatial cues when attempting to segregate competing speech. Adjusting the FAT to tonotopically match the electrode position in each ear (i.e., increasing the low acoustic input frequency) is theorized to reduce this inter-aural mismatch. Unfortunately, this approach may also introduce the loss of acoustic information below the modified input acoustic frequency. The present study explored the trade-off between reduced inter-aural frequency mismatch and low-frequency information loss for segregation of competing speech. Normal-hearing participants were tested while listening to acoustic simulations of bilateral CIs. Speech reception thresholds (SRTs) were measured for target sentences produced by a male talker in the presence of two different male talkers. Masker speech was either co-located with or spatially separated from the target speech. The bilateral CI simulations were produced by 16-channel sinewave vocoders; the simulated insertion depth was fixed in one ear and varied in the other ear, resulting in an inter-aural mismatch of 0, 2, or 6 mm in terms of cochlear place. Two FAT conditions were compared: 1) clinical (200-8000 Hz in both ears), or 2) matched to the simulated insertion depth in each ear. Results showed that SRTs were significantly lower with the matched than with the clinical FAT, regardless of the insertion depth or spatial configuration of the masker speech. The largest improvement in SRTs with the matched FAT was observed when the inter-aural mismatch was largest (6 mm). These results suggest that minimizing inter-aural mismatch with tonotopically matched FATs may benefit bilateral CI users' ability to segregate competing speech despite substantial low-frequency information loss in ears with shallow insertion depths.

Project description:Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners' ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners' ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal situations speech is perceived as an undemanding task. Degradation of the signal or the receiver channel can quickly bring this well-adjusted timing out of balance and lead to increase in mental effort. Incomplete and effortful processing at the early pre-lexical stages has its consequences on lexical processing as it adds uncertainty to the forming and revising of lexical hypotheses.

Dataset Information

Weighting of Prosodic and Lexical-Semantic Cues for Emotion Identification in Spectrally Degraded Speech and With Cochlear Implants.

Objectives

Design

Results

Conclusions

Publications

Weighting of Prosodic and Lexical-Semantic Cues for Emotion Identification in Spectrally Degraded Speech and With Cochlear Implants.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets