Dataset Information

A procedure for estimating gestural scores from speech acoustics.

ABSTRACT: Speech can be represented as a constellation of constricting vocal tract actions called gestures, whose temporal patterning with respect to one another is expressed in a gestural score. Current speech datasets do not come with gestural annotation and no formal gestural annotation procedure exists at present. This paper describes an iterative analysis-by-synthesis landmark-based time-warping architecture to perform gestural annotation of natural speech. For a given utterance, the Haskins Laboratories Task Dynamics and Application (TADA) model is employed to generate a corresponding prototype gestural score. The gestural score is temporally optimized through an iterative timing-warping process such that the acoustic distance between the original and TADA-synthesized speech is minimized. This paper demonstrates that the proposed iterative approach is superior to conventional acoustically-referenced dynamic timing-warping procedures and provides reliable gestural annotation for speech datasets.

SUBMITTER: Nam H

PROVIDER: S-EPMC3528686 | biostudies-other | 2012 Dec

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:Purpose The classroom acoustic standard ANSI/ASA S12.60-2010/Part 1 requires a reverberation time (RT) for children with hearing impairment of 0.3 s, shorter than its requirement of 0.6 s for children with typical hearing. While preliminary data from conference proceedings support this new RT requirement of 0.3 s, peer-reviewed data that support 0.3-s RT are not available on those wearing hearing aids. To help address this, this article compares speech perception performance by children with hearing aids in RTs, including those specified in the ANSI/ASA-2010 standard. A related clinical issue is whether assessments of speech perception conducted in near-anechoic sound booths, which may overestimate performance in reverberant classrooms, may now provide a more reliable estimate when the child is in a classroom with a short RT of 0.3 s. To address this, this study compared speech perception by children with hearing aids in a sound booth to listening in 0.3-s RT. Method Participants listened in classroom RTs of 0.3, 0.6, and 0.9 s and in a near-anechoic sound booth. All conditions also included a 21-dB range of speech-to-noise ratios (SNRs) to further represent classroom listening environments. Performance measures using the Bamford-Kowal-Bench Speech-in-Noise (BKB-SIN) test were 50% correct word recognition across these acoustic conditions, with supplementary analyses of percent correct. Results Each reduction in RT from 0.9 to 0.6 to 0.3 s significantly benefited the children's perception of speech. Scores obtained in a sound booth were significantly better than those measured in 0.3-s RT. Conclusion These results support the acoustic standard of 0.3-s RT for children with hearing impairment in learning spaces ? 283 m3, as specified in ANSI/ASA S12.60-2010/Part 1. Additionally, speech perception testing in a sound booth did not predict accurately listening ability in a classroom with 0.3-s RT. Supplemental Material https://doi.org/10.23641/asha.11356487.

Dataset Information

A procedure for estimating gestural scores from speech acoustics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets