Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

ABSTRACT: In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.

SUBMITTER: Panchapagesan S

PROVIDER: S-EPMC3188964 | biostudies-other | 2011 Apr

REPOSITORIES: biostudies-other

ACCESS DATA

Json Xml

Similar Datasets

Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion.

Project description:An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched.

| S-EPMC3189967 | biostudies-other

Whole-brain dynamics of articulatory, acoustic and semantic speech representations.

Project description:Speech production is a complex process that traverses several representations, from the meaning of spoken words (semantic), through the movement of articulatory muscles (articulatory) and, ultimately, to the produced audio waveform (acoustic). In this study, we identify how these different representations of speech are spatially and temporally distributed throughout the depth of the brain. Intracranial neural data is recorded from 15 participants, across 1647 electrode contacts, while overtly speaking 100 unique words. We find a bilateral spatial distribution for all three representations, with a more widespread and temporally dynamic distribution in the left compared to the right hemisphere. The articulatory and acoustic representations share a similar spatial distribution surrounding the Sylvian fissure, while the semantic representation is more widely distributed across the brain in a mostly distinct network. These results highlight the distributed nature of the speech production neural process and the potential of non-motor representations for speech brain-computer interfaces.

| S-EPMC11906857 | biostudies-literature

Modeling consonant-vowel coarticulation for articulatory speech synthesis.

Project description:A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.

| S-EPMC3628899 | biostudies-literature

A generalized smoothness criterion for acoustic-to-articulatory inversion.

Project description:The many-to-one mapping from representations in the speech articulatory space to acoustic space renders the associated acoustic-to-articulatory inverse mapping non-unique. Among various techniques, imposing smoothness constraints on the articulator trajectories is one of the common approaches to handle the non-uniqueness in the acoustic-to-articulatory inversion problem. This is because, articulators typically move smoothly during speech production. A standard smoothness constraint is to minimize the energy of the difference of the articulatory position sequence so that the articulator trajectory is smooth and low-pass in nature. Such a fixed definition of smoothness is not always realistic or adequate for all articulators because different articulators have different degrees of smoothness. In this paper, an optimization formulation is proposed for the inversion problem, which includes a generalized smoothness criterion. Under such generalized smoothness settings, the smoothness parameter can be chosen depending on the specific articulator in a data-driven fashion. In addition, this formulation allows estimation of articulatory positions recursively over time without any loss in performance. Experiments with the MOCHA TIMIT database show that the estimated articulator trajectories obtained using such a generalized smoothness criterion have lower RMS error and higher correlation with the actual measured trajectories compared to those obtained using a fixed smoothness constraint.

| S-EPMC2981125 | biostudies-other

Automatic prediction of intelligible speaking rate for individuals with ALS from speech acoustic and articulatory samples.

Project description:Purpose: This research aimed to automatically predict intelligible speaking rate for individuals with Amyotrophic Lateral Sclerosis (ALS) based on speech acoustic and articulatory samples. Method: Twelve participants with ALS and two normal subjects produced a total of 1831 phrases. NDI Wave system was used to collect tongue and lip movement and acoustic data synchronously. A machine learning algorithm (i.e. support vector machine) was used to predict intelligible speaking rate (speech intelligibility × speaking rate) from acoustic and articulatory features of the recorded samples. Result: Acoustic, lip movement, and tongue movement information separately, yielded a R2 of 0.652, 0.660, and 0.678 and a Root Mean Squared Error (RMSE) of 41.096, 41.166, and 39.855 words per minute (WPM) between the predicted and actual values, respectively. Combining acoustic, lip and tongue information we obtained the highest R2 (0.712) and the lowest RMSE (37.562 WPM). Conclusion: The results revealed that our proposed analyses predicted the intelligible speaking rate of the participant with reasonably high accuracy by extracting the acoustic and/or articulatory features from one short speech sample. With further development, the analyses may be well-suited for clinical applications that require automatic speech severity prediction.

| S-EPMC6506394 | biostudies-literature

Activation of articulatory information in speech perception.

Project description:Emerging neurophysiologic evidence indicates that motor systems are activated during the perception of speech, but whether this activity reflects basic processes underlying speech perception remains a matter of considerable debate. Our contribution to this debate is to report direct behavioral evidence that specific articulatory commands are activated automatically and involuntarily during speech perception. We used electropalatography to measure whether motor information activated from spoken distractors would yield specific distortions on the articulation of printed target syllables. Participants produced target syllables beginning with /k/ or /s/ while listening to the same syllables or to incongruent rhyming syllables beginning with /t/. Tongue-palate contact for target productions was measured during the articulatory closure of /k/ and during the frication of /s/. Results revealed "traces" of the incongruent distractors on target productions, with the incongruent /t/-initial distractors inducing greater alveolar contact in the articulation of /k/ and /s/ than the congruent distractors. Two further experiments established that (i) the nature of this interference effect is dependent specifically on the articulatory properties of the spoken distractors; and (ii) this interference effect is unique to spoken distractors and does not arise when distractors are presented in printed form. Results are discussed in terms of a broader emerging framework concerning the relationship between perception and action, whereby the perception of action entails activation of the motor system.

| S-EPMC2818927 | biostudies-literature

An exploratory model of speech intelligibility for healthy aging based on phonatory and articulatory measures.

Project description:PurposeThe aims of the current study were to determine age-related changes to the phonatory and articulatory subsystems and to investigate an exploratory model of intelligibility for healthy aging based on phonatory and articulatory measures.MethodFifteen healthy, older adults (55-81 years) and 15 younger adults (20-35 years) participated in instrumental assessments of the phonatory (aerodynamic, acoustic) and articulatory (kinematic) subsystems. Speech intelligibility was determined by five listeners during multi-talker babble.ResultsOlder adults displayed shorter maximum phonation time, greater airflow during sentence reading, and lower cepstral peak prominence (CPP) and CPP SD. Additionally, older adults had slower tongue movement speed than younger adults. Speech intelligibility was also significantly reduced in the older group. A generalized estimating equations model combining phonatory and articulatory measures showed that CPP SD, low/high (L/H) spectral ratio mean and SD, Cepstral Spectral Index of Dysphonia (CSID), and maximum tongue movement speed were significant contributors to intelligibility changes in older individuals. While L/H mean and SD and CSID displayed an inverse relationship with intelligibility, CPP SD and maximum tongue speed displayed a direct relationship with intelligibility.DiscussionAging affects the phonatory and articulatory subsystems with implications for speech intelligibility. Phonatory cepstral/spectral measures (except mean CPP) were associated with speech intelligibility changes, suggesting that changes in voice quality may contribute to reduced intelligibility in older adults. Pertaining to articulation, slower tongue movement speed likely contributed to reduced intelligibility in older individuals.

| S-EPMC7494532 | biostudies-literature

The Articulatory Phonetics of /r/ for Residual Speech Errors.

Project description:Effective treatment for children with residual speech errors (RSEs) requires in-depth knowledge of articulatory phonetics, but this level of detail may not be provided as part of typical clinical coursework. At a time when new imaging technologies such as ultrasound continue to inform our clinical understanding of speech disorders, incorporating contemporary work in the basic articulatory sciences into clinical training becomes especially important. This is particularly the case for the speech sound most likely to persist among children with RSEs-the North American English rhotic sound, /r/. The goal of this article is to review important information about articulatory phonetics as it affects children with RSE who present with /r/ production difficulties. The data presented are largely drawn from ultrasound and magnetic resonance imaging studies. This information will be placed in a clinical context by comparing productions of typical adult speakers to successful versus misarticulated productions of two children with persistent /r/ difficulties.

| S-EPMC4915106 | biostudies-literature

Tutorial: Using Visual-Acoustic Biofeedback for Speech Sound Training.

Project description:PurposeThis tutorial summarizes current practices using visual-acoustic biofeedback (VAB) treatment to improve speech outcomes for individuals with speech sound difficulties. Clinical strategies will focus on residual distortions of /ɹ/.MethodSummary evidence related to the characteristics of VAB and the populations that may benefit from this treatment are reviewed. Guidelines are provided for clinicians on how to use VAB with clients to identify and modify their productions to match an acoustic representation. The clinical application of a linear predictive coding spectrum is emphasized.ResultsSuccessful use of VAB requires several key factors including clinician and client comprehension of the acoustic representation, appropriate acoustic target and template selection, as well as appropriate selection of articulatory strategies, practice schedules, and feedback models to scaffold acquisition of new speech sounds.ConclusionIntegrating a VAB component in clinical practice offers additional intervention options for individuals with speech sound difficulties and often facilitates improved speech sound acquisition and generalization outcomes.Supplemental materialhttps://doi.org/10.23641/asha.21817722.

| S-EPMC10023147 | biostudies-literature

Encoding of Articulatory Kinematic Trajectories in Human Speech Sensorimotor Cortex.

Project description:When speaking, we dynamically coordinate movements of our jaw, tongue, lips, and larynx. To investigate the neural mechanisms underlying articulation, we used direct cortical recordings from human sensorimotor cortex while participants spoke natural sentences that included sounds spanning the entire English phonetic inventory. We used deep neural networks to infer speakers' articulator movements from produced speech acoustics. Individual electrodes encoded a diversity of articulatory kinematic trajectories (AKTs), each revealing coordinated articulator movements toward specific vocal tract shapes. AKTs captured a wide range of movement types, yet they could be differentiated by the place of vocal tract constriction. Additionally, AKTs manifested out-and-back trajectories with harmonic oscillator dynamics. While AKTs were functionally stereotyped across different sentences, context-dependent encoding of preceding and following movements during production of the same phoneme demonstrated the cortical representation of coarticulation. Articulatory movements encoded in sensorimotor cortex give rise to the complex kinematics underlying continuous speech production. VIDEO ABSTRACT.

| S-EPMC5992088 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data