Project description:This study investigated the benefits of differences between sentences in fundamental frequency (F0) and temporal onset for sentence pairs among listener groups differing in age and hearing sensitivity. Two experiments were completed with the primary difference between experiments being the way in which the stimuli were presented. Experiment 1 used blocked stimulus presentation, which ultimately provided redundant acoustic cues to mark the target sentence in each pair, whereas Experiment 2 sampled a slightly more restricted stimulus space, but in a completely randomized presentation order. For both experiments, listeners were required to detect a cue word ("Baron") for the target sentence in each pair and to then identify the target words (color, number) that appeared later in the target sentence. Results of Experiment 1 showed that F0 or onset separation cues were beneficial to both cue-word detection and color-number identification performance. There were no significant differences across groups in the ability to detect the cue word, but groups differed in their ability to identify the correct color-number words. Elderly adults with impaired hearing had the greatest difficulty with the identification task despite the application of spectral shaping to restore the audibility of the speech stimuli. For the most part, the primary results of Experiment 1 were replicated in Experiment 2, although, in the latter experiment, all older adults, whether they had normal or impaired hearing, performed worse than young adults with normal hearing. From Experiment 2, the benefits received for a difference in F0 between talkers of 6 semitones were equivalent to those received for an onset asynchrony of 300 ms between sentences and, for such conditions, the combination of both sound-segregation cues resulted in an additive benefit.
Project description:Joint speech behaviours where speakers produce speech in unison are found in a variety of everyday settings, and have clinical relevance as a temporary fluency-enhancing technique for people who stutter. It is currently unknown whether such synchronisation of speech timing among two speakers is also accompanied by alignment in their vocal characteristics, for example in acoustic measures such as pitch. The current study investigated this by testing whether convergence in voice fundamental frequency (F0) between speakers could be demonstrated during synchronous speech. Sixty participants across two online experiments were audio recorded whilst reading a series of sentences, first on their own, and then in synchrony with another speaker (the accompanist) in a number of between-subject conditions. Experiment 1 demonstrated significant convergence in participants' F0 to a pre-recorded accompanist voice, in the form of both upward (high F0 accompanist condition) and downward (low and extra-low F0 accompanist conditions) changes in F0. Experiment 2 demonstrated that such convergence was not seen during a visual synchronous speech condition, in which participants spoke in synchrony with silent video recordings of the accompanist. An audiovisual condition in which participants were able to both see and hear the accompanist in pre-recorded videos did not result in greater convergence in F0 compared to synchronisation with the pre-recorded voice alone. These findings suggest the need for models of speech motor control to incorporate interactions between self- and other-speech feedback during speech production, and suggest a novel hypothesis for the mechanisms underlying the fluency-enhancing effects of synchronous speech in people who stutter.
Project description:The present study examined the relative influence of the off- and on-frequency spectral components of modulated and unmodulated maskers on consonant recognition. Stimuli were divided into 30 contiguous equivalent rectangular bandwidths. The temporal fine structure (TFS) in each "target" band was either left intact or replaced with tones using vocoder processing. Recognition scores for 10, 15 and 20 target bands randomly located in frequency were obtained in quiet and in the presence of all 30 masker bands, only the off-frequency masker bands, or only the on-frequency masker bands. The amount of masking produced by the on-frequency bands was generally comparable to that produced by the broadband masker. However, the difference between these two conditions was often significant, indicating an influence of the off-frequency masker bands, likely through modulation interference or spectral restoration. Although vocoder processing systematically lead to poorer consonant recognition scores, the deficit observed in noise could often be attributed to that observed in quiet. These data indicate that (i) speech recognition is affected by the off-frequency components of the background and (ii) the nature of the target TFS does not systematically affect speech recognition in noise, especially when energetic masking and/or the number of target bands is limited.
Project description:This study investigates whether listeners' experience with a second language learned later in life affects their use of fundamental frequency (F0) as a cue to word boundaries in the segmentation of an artificial language (AL), particularly when the cues to word boundaries conflict between the first language (L1) and second language (L2). F0 signals phrase-final (and thus word-final) boundaries in French but word-initial boundaries in English. Participants were functionally monolingual French listeners, functionally monolingual English listeners, bilingual L1-English L2-French listeners, and bilingual L1-French L2-English listeners. They completed the AL-segmentation task with F0 signaling word-final boundaries or without prosodic cues to word boundaries (monolingual groups only). After listening to the AL, participants completed a forced-choice word-identification task in which the foils were either non-words or part-words. The results show that the monolingual French listeners, but not the monolingual English listeners, performed better in the presence of F0 cues than in the absence of such cues. Moreover, bilingual status modulated listeners' use of F0 cues to word-final boundaries, with bilingual French listeners performing less accurately than monolingual French listeners on both word types but with bilingual English listeners performing more accurately than monolingual English listeners on non-words. These findings not only confirm that speech segmentation is modulated by the L1, but also newly demonstrate that listeners' experience with the L2 (French or English) affects their use of F0 cues in speech segmentation. This suggests that listeners' use of prosodic cues to word boundaries is adaptive and non-selective, and can change as a function of language experience.
Project description:Purpose This study investigated how modulating fundamental frequency (f0) and speech rate differentially impact the naturalness, intelligibility, and communication efficiency of synthetic speech. Method Sixteen sentences of varying prosodic content were developed via a speech synthesizer. The f0 contour and speech rate of these sentences were altered to produce 4 stimulus sets: (a) normal rate with a fixed f0 level, (b) slow rate with a fixed f0 level, (c) normal rate with prosodically natural f0 variation, and (d) normal rate with prosodically unnatural f0 variation. Sixteen listeners provided orthographic transcriptions and judgments of naturalness for these stimuli. Results Sentences with f0 variation were rated as more natural than those with a fixed f0 level. Conversely, sentences with a fixed f0 level demonstrated higher intelligibility than those with f0 variation. Speech rate did not affect the intelligibility of stimuli with a fixed f0 level. Communication efficiency was highest for sentences produced at a normal rate and a fixed f0 level. Conclusions Sentence-level f0 variation increased naturalness ratings of synthesized speech, whether the variation was prosodically natural or not. However, these f0 variations reduced intelligibility. There is evidence of a trade-off in naturalness and intelligibility of synthesized speech, which may impact future speech synthesis designs. Supplemental Material https://doi.org/10.23641/asha.8847833.
Project description:This study investigates whether the learning of prosodic cues to word boundaries in speech segmentation is more difficult if the native and second/foreign languages (L1 and L2) have similar (though non-identical) prosodies than if they have markedly different prosodies (Prosodic-Learning Interference Hypothesis). It does so by comparing French, Korean, and English listeners' use of fundamental-frequency (F0) rise as a cue to word-final boundaries in French. F0 rise signals phrase-final boundaries in French and Korean but word-initial boundaries in English. Korean-speaking and English-speaking L2 learners of French, who were matched in their French proficiency and French experience, and native French listeners completed a visual-world eye-tracking experiment in which they recognized words whose final boundary was or was not cued by an increase in F0. The results showed that Korean listeners had greater difficulty using F0 rise as a cue to word-final boundaries in French than French and English listeners. This suggests that L1-L2 prosodic similarity can make the learning of an L2 segmentation cue difficult, in line with the proposed Prosodic-Learning Interference Hypothesis. We consider mechanisms that may underlie this difficulty and discuss the implications of our findings for understanding listeners' phonological encoding of L2 words.
Project description:Objectives/hypothesesCharismatic leaders use vocal behavior to persuade their audience, achieve goals, arouse emotional states, and convey personality traits and leadership status. This study investigates voice fundamental frequency (f0) and sound pressure level (SPL) in female and male French, Italian, Brazilian, and American politicians to determine which acoustic parameters are related to cross-gender and cross-cultural common vocal abilities, and which derive from culture-, gender-, and language-specific vocal strategies used to adapt vocal behavior to listeners' culture-related expectations.Study designSpeech corpora were collected for two formal communicative contexts (leaders address followers or other leaders) and one informal communicative context (dyadic interaction), based on the persuasive goals inherent in each context and on the relative status of the listeners and speakers. Leaders' acoustic voice profiles were created to show differences in f0 and SPL manipulation with respect to speakers' gender and language in each communicative context.ResultsCross-gender and cross-language similarities in manipulation of average f0 and in f0 and SPL ranges occurred in all communicative contexts. Patterns of f0 manipulation were shared across genders and cultures, suggesting this dimension might be biologically based and is exploited by leaders to convey dominance. Ranges for f0 and SPL seemed to be affected by the communicative context, being wider or narrower depending on the persuasive goal. Results also showed language- and speaker-specific differences in the acoustic manipulation of f0 and SPL over time.ConclusionsThese findings are consistent with the idea that specific charismatic leaders' vocal behaviors depend on a fine combination of vocal abilities that are shared across cultures and genders, combined with culturally- and linguistically-filtered vocal strategies.
Project description:Speech recognition by second language (L2) learners in optimal and suboptimal conditions has been examined extensively with English as the target language in most previous studies. This study extended existing experimental protocols (Wang et al., 2013) to investigate Mandarin speech recognition by Japanese learners of Mandarin at two different levels (elementary vs. intermediate) of proficiency. The overall results showed that in addition to L2 proficiency, semantic context, F0 contours, and listening condition all affected the recognition performance on the Mandarin sentences. However, the effects of semantic context and F0 contours on L2 speech recognition diverged to some extent. Specifically, there was significant modulation effect of listening condition on semantic context, indicating that L2 learners made use of semantic context less efficiently in the interfering background than in quiet. In contrast, no significant modulation effect of listening condition on F0 contours was found. Furthermore, there was significant interaction between semantic context and F0 contours, indicating that semantic context becomes more important for L2 speech recognition when F0 information is degraded. None of these effects were found to be modulated by L2 proficiency. The discrepancy in the effects of semantic context and F0 contours on L2 speech recognition in the interfering background might be related to differences in processing capacities required by the two types of information in adverse listening conditions.
Project description:Recent research on speech communication has revealed a tendency for speakers to imitate at least some of the characteristics of their interlocutor's speech sound shape. This phenomenon, referred to as phonetic convergence, entails a moment-to-moment adaptation of the speaker's speech targets to the perceived interlocutor's speech. It is thought to contribute to setting up a conversational common ground between speakers and to facilitate mutual understanding. However, it remains uncertain to what extent phonetic convergence occurs in voice fundamental frequency (F0), in spite of the major role played by pitch, F0's perceptual correlate, as a conveyor of both linguistic information and communicative cues associated with the speaker's social/individual identity and emotional state. In the present work, we investigated to what extent two speakers converge towards each other with respect to variations in F0 in a scripted dialogue. Pairs of speakers jointly performed a speech production task, in which they were asked to alternately read aloud a written story divided into a sequence of short reading turns. We devised an experimental set-up that allowed us to manipulate the speakers' F0 in real time across turns. We found that speakers tended to imitate each other's changes in F0 across turns that were both limited in amplitude and spread over large temporal intervals. This shows that, at the perceptual level, speakers monitor slow-varying movements in their partner's F0 with high accuracy and, at the production level, that speakers exert a very fine-tuned control on their laryngeal vibrator in order to imitate these F0 variations. Remarkably, F0 convergence across turns was found to occur in spite of the large melodic variations typically associated with reading turns. Our study sheds new light on speakers' perceptual tracking of F0 in speech processing, and the impact of this perceptual tracking on speech production.
Project description:Purpose This study investigated methods used to simulate factors associated with reduced audibility, increased speech levels, and spectral shaping for aided older adults with hearing loss. Simulations provided to younger normal-hearing adults were used to investigate the effect of sensation level, speech presentation level, and spectral shape in comparison to older adults with hearing loss. Method Measures were assessed in quiet, steady-state noise, and speech-modulated noise. Older adults with hearing loss listened to speech that was spectrally shaped according to their hearing thresholds. Younger adults with normal hearing listened to speech that simulated the hearing-impaired group's (a) reduced audibility, (b) increased speech levels, and (c) spectral shaping. Group comparisons were made based on speech recognition performance and masking release. Additionally, younger adults completed measures of listening effort and perceived speech quality to assess if differences across simulations in these outcome measures were similar to those for speech recognition. Results Across the various simulations employed, testing in the presence of a threshold matching noise best matched differences in speech recognition and masking release between younger and older adults. This result remained consistent across the other two outcome measures. Conclusions A combination of audibility, speech level, and spectral shape factors is required to simulate differences between listeners with normal and impaired hearing in recognition, listening effort, and perceived speech quality. The use of spectrally shaped and amplified speech in the presence of threshold matching noise best provided this simulated control. Supplemental Material https://doi.org/10.23641/asha.13224632.