Co-registration of speech production datasets from electromagnetic articulography and real-time magnetic resonance imaging.
Ontology highlight
ABSTRACT: This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics.
Project description:PurposeReal-time magnetic resonance imaging (MRI) and accompanying analytical methods are shown to capture and quantify salient aspects of apraxic speech, substantiating and expanding upon evidence provided by clinical observation and acoustic and kinematic data. Analysis of apraxic speech errors within a dynamic systems framework is provided and the nature of pathomechanisms of apraxic speech discussed.MethodOne adult male speaker with apraxia of speech was imaged using real-time MRI while producing spontaneous speech, repeated naming tasks, and self-paced repetition of word pairs designed to elicit speech errors. Articulatory data were analyzed, and speech errors were detected using time series reflecting articulatory activity in regions of interest.ResultsReal-time MRI captured two types of apraxic gestural intrusion errors in a word pair repetition task. Gestural intrusion errors in nonrepetitive speech, multiple silent initiation gestures at the onset of speech, and covert (unphonated) articulation of entire monosyllabic words were also captured.ConclusionReal-time MRI and accompanying analytical methods capture and quantify many features of apraxic speech that have been previously observed using other modalities while offering high spatial resolution. This patient's apraxia of speech affected the ability to select only the appropriate vocal tract gestures for a target utterance, suppressing others, and to coordinate them in time.
Project description:Tools available for reproducible, quantitative assessment of brain correspondence have been limited. We previously validated the anatomical fiducial (AFID) placement protocol for point-based assessment of image registration with millimetric (mm) accuracy. In this data descriptor, we release curated AFID placements for some of the most commonly used structural magnetic resonance imaging datasets and templates. The release of our accurate placements allows for rapid quality control of image registration, teaching neuroanatomy, and clinical applications such as disease diagnosis and surgical targeting. We release placements on individual subjects from four datasets (N = 132 subjects for a total of 15,232 fiducials) and 14 brain templates (4,288 fiducials), totalling more than 300 human rater hours of annotation. We also validate human rater accuracy of released placements to be within 1 - 2 mm (using more than 45,000 Euclidean distances), consistent with prior studies. Our data is compliant with the Brain Imaging Data Structure allowing for facile incorporation into neuroimaging analysis pipelines.
Project description:Magnetoencephalography (MEG) can non-invasively measure the electromagnetic activity of the brain. A new type of MEG, on-scalp MEG, has attracted the attention of researchers recently. Compared to the conventional SQUID-MEG, on-scalp MEG constructed with optically pumped magnetometers is wearable and has a high signal-to-noise ratio. While the co-registration between MEG and magnetic resonance imaging (MRI) significantly influences the source localization accuracy, co-registration error requires assessment, and quantification. Recent studies have evaluated the co-registration error of on-scalp MEG mainly based on the surface fit error or the repeatability error of different measurements, which do not reflect the true co-registration error. In this study, a three-dimensional-printed reference phantom was constructed to provide the ground truth of MEG sensor locations and orientations relative to MRI. The co-registration performances of commonly used three devices-electromagnetic digitization system, structured-light scanner, and laser scanner-were compared and quantified by the indices of final co-registration errors in the reference phantom and human experiments. Furthermore, the influence of the co-registration error on the performance of source localization was analyzed via simulations. The laser scanner had the best co-registration accuracy (rotation error of 0.23° and translation error of 0.76 mm based on the phantom experiment), whereas the structured-light scanner had the best cost performance. The results of this study provide recommendations and precautions for researchers regarding selecting and using an appropriate device for the co-registration of on-scalp MEG and MRI.
Project description:Real-time magnetic resonance imaging (rtMRI) was used to examine mechanisms of sound production by an American male beatbox artist. rtMRI was found to be a useful modality with which to study this form of sound production, providing a global dynamic view of the midsagittal vocal tract at frame rates sufficient to observe the movement and coordination of critical articulators. The subject's repertoire included percussion elements generated using a wide range of articulatory and airstream mechanisms. Many of the same mechanisms observed in human speech production were exploited for musical effect, including patterns of articulation that do not occur in the phonologies of the artist's native languages: ejectives and clicks. The data offer insights into the paralinguistic use of phonetic primitives and the ways in which they are coordinated in this style of musical performance. A unified formalism for describing both musical and phonetic dimensions of human vocal percussion performance is proposed. Audio and video data illustrating production and orchestration of beatboxing sound effects are provided in a companion annotated corpus.
Project description:Background and objectiveMagnetic resonance (MR) imaging is increasingly used in studies of speech as it enables non-invasive visualisation of the vocal tract and articulators, thus providing information about their shape, size, motion and position. Extraction of this information for quantitative analysis is achieved using segmentation. Methods have been developed to segment the vocal tract, however, none of these also fully segment any articulators. The objective of this work was to develop a method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech, thus overcoming the limitations of existing methods.MethodsFive speech MR image sets (392 MR images in total), each of a different healthy adult volunteer, were used in this work. A fully convolutional network with an architecture similar to the original U-Net was developed to segment the following six regions in the image sets: the head, soft palate, jaw, tongue, vocal tract and tooth space. A five-fold cross-validation was performed to investigate the segmentation accuracy and generalisability of the network. The segmentation accuracy was assessed using standard overlap-based metrics (Dice coefficient and general Hausdorff distance) and a novel clinically relevant metric based on velopharyngeal closure.ResultsThe segmentations created by the method had a median Dice coefficient of 0.92 and a median general Hausdorff distance of 5mm. The method segmented the head most accurately (median Dice coefficient of 0.99), and the soft palate and tooth space least accurately (median Dice coefficients of 0.92 and 0.93 respectively). The segmentations created by the method correctly showed 90% (27 out of 30) of the velopharyngeal closures in the MR image sets.ConclusionsAn automatic method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech was successfully developed. The method is intended for use in clinical and non-clinical speech studies which involve quantitative analysis of the shape, size, motion and position of the vocal tract and articulators. In addition, a novel clinically relevant metric for assessing the accuracy of vocal tract and articulator segmentation methods was developed.
Project description:Fast, minimally invasive, high-resolution intravascular imaging is essential for identifying vascular pathological features and for developing novel diagnostic tools and treatments. Intravascular magnetic resonance imaging (MRI) with active internal probes offers high sensitivity to pathological features without ionizing radiation or the limited luminal views of conventional X-rays, but has been unable to provide a high-speed, high-resolution, endoscopic view. Herein, real-time MRI endoscopy is introduced for performing MRI from a viewpoint intrinsically locked to a miniature active, internal transmitter-receiver in a clinical 3.0-T MRI scanner. Real-time MRI endoscopy at up to 2 frames/s depicts vascular wall morphological features, atherosclerosis, and calcification at 80 to 300 ?m resolution during probe advancement through diseased human iliac artery specimens and atherosclerotic rabbit aortas in vivo. MRI endoscopy offers the potential for fast, minimally invasive, transluminal, high-resolution imaging of vascular disease on a common clinical platform suitable for evaluating and targeting atherosclerosis in both experimental and clinical settings.
Project description:PURPOSE:To improve the depiction and tracking of vocal tract articulators in spiral real-time MRI (RT-MRI) of speech production by estimating and correcting for dynamic changes in off-resonance. METHODS:The proposed method computes a dynamic field map from the phase of single-TE dynamic images after a coil phase compensation where complex coil sensitivity maps are estimated from the single-TE dynamic scan itself. This method is tested using simulations and in vivo data. The depiction of air-tissue boundaries is evaluated quantitatively using a sharpness metric and visual inspection. RESULTS:Simulations demonstrate that the proposed method provides robust off-resonance correction for spiral readout durations up to 5 ms at 1.5T. In -vivo experiments during human speech production demonstrate that image sharpness is improved in a majority of data sets at air-tissue boundaries including the upper lip, hard palate, soft palate, and tongue boundaries, whereas the lower lip shows little improvement in the edge sharpness after correction. CONCLUSION:Dynamic off-resonance correction is feasible from single-TE spiral RT-MRI data, and provides a practical performance improvement in articulator sharpness when applied to speech production imaging.
Project description:Automatic speech recognition (ASR) has been an active area of research. Training with large annotated datasets is the key to the development of robust ASR systems. However, most available datasets are focused on high-resource languages like English, leaving a significant gap for low-resource languages. Among these languages is Punjabi, despite its large number of speakers, Punjabi lacks high-quality annotated datasets for accurate speech recognition. To address this gap, we introduce three labeled Punjabi speech datasets: Punjabi Speech (real speech dataset) and Google-synth/CMU-synth (synthesized speech datasets). The Punjabi Speech dataset consists of read speech recordings captured in various environments, including both studio and open settings. In addition, the Google-synth dataset is synthesized using Google's Punjabi text-to-speech cloud services. Furthermore, the CMU-synth dataset is created using the Clustergen model available in the Festival speech synthesis system developed by CMU. These datasets aim to facilitate the development of accurate Punjabi speech recognition systems, bridging the resource gap for this important language.
Project description:Magnetic resonance imaging (MRI) of the cardiovascular system has proven to be an invaluable diagnostic tool. Given the ability to allow for real-time imaging, MRI guidance of intraoperative procedures can provide superb visualization, which can facilitate a variety of interventions and minimize the trauma of the operations as well. In addition to the anatomic detail, MRI can provide intraoperative assessment of organ and device function. Instruments and devices can be marked to enhance visualization and tracking, all of which is an advance over standard X-ray or ultrasonic imaging.