Dataset Information

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network.

ABSTRACT: The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future.

SUBMITTER: Fehling MK

PROVIDER: S-EPMC7010264 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network.

Fehling Mona Kirstin MK Grosch Fabian F Schuster Maria Elke ME Schick Bernhard B Lohscheller Jörg J

PloS one 20200210 2

The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process ...[more]

PMID: 32040514

Similar Datasets

Project description:BackgroundPathological atrial fibrosis is a major contributor to sustained atrial fibrillation. Currently, late gadolinium enhancement (LGE) scans provide the only noninvasive estimate of atrial fibrosis. However, widespread adoption of atrial LGE has been hindered partly by nonstandardized image processing techniques, which can be operator and algorithm dependent. Minimal validation and limited access to transparent software platforms have also exacerbated the problem. This study aims to estimate atrial fibrosis from cardiac magnetic resonance scans using a reproducible operator-independent fully automatic open-source end-to-end pipeline.MethodsA multilabel convolutional neural network was designed to accurately delineate atrial structures including the blood pool, pulmonary veins, and mitral valve. The output from the network removed the operator dependent steps in a reproducible pipeline and allowed for automated estimation of atrial fibrosis from LGE-cardiac magnetic resonance scans. The pipeline results were compared against manual fibrosis burdens, calculated using published thresholds: image intensity ratio 0.97, image intensity ratio 1.61, and mean blood pool signal +3.3 SD.ResultsWe validated our methods on a large 3-dimensional LGE-cardiac magnetic resonance data set from 207 labeled scans. Automatic atrial segmentation achieved a 91% Dice score, compared with the mutual agreement of 85% in Dice seen in the interobserver analysis of operators. Intraclass correlation coefficients of the automatic pipeline with manually generated results were excellent and better than or equal to interobserver correlations for all 3 thresholds: 0.94 versus 0.88, 0.99 versus 0.99, 0.99 versus 0.96 for image intensity ratio 0.97, image intensity ratio 1.61, and +3.3 SD thresholds, respectively. Automatic analysis required 3 minutes per case on a standard workstation. The network and the analysis software are publicly available.ConclusionsOur pipeline provides a fully automatic estimation of fibrosis burden from LGE-cardiac magnetic resonance scans that is comparable to manual analysis. This removes one key source of variability in the measurement of atrial fibrosis.

Project description:MotivationHuman voice is generated in the larynx by the two oscillating vocal folds. Owing to the limited space and accessibility of the larynx, endoscopic investigation of the actual phonatory process in detail is challenging. Hence the biomechanics of the human phonatory process are still not yet fully understood. Therefore, we adapt a mathematical model of the vocal folds towards vocal fold oscillations to quantify gender and age related differences expressed by computed biomechanical model parameters.MethodsThe vocal fold dynamics are visualized by laryngeal high-speed videoendoscopy (4000 fps). A total of 33 healthy young subjects (16 females, 17 males) and 11 elderly subjects (5 females, 6 males) were recorded. A numerical two-mass model is adapted to the recorded vocal fold oscillations by varying model masses, stiffness and subglottal pressure. For adapting the model towards the recorded vocal fold dynamics, three different optimization algorithms (Nelder-Mead, Particle Swarm Optimization and Simulated Bee Colony) in combination with three cost functions were considered for applicability. Gender differences and age-related kinematic differences reflected by the model parameters were analyzed.Results and conclusionThe biomechanical model in combination with numerical optimization techniques allowed phonatory behavior to be simulated and laryngeal parameters involved to be quantified. All three optimization algorithms showed promising results. However, only one cost function seems to be suitable for this optimization task. The gained model parameters reflect the phonatory biomechanics for men and women well and show quantitative age- and gender-specific differences. The model parameters for younger females and males showed lower subglottal pressures, lower stiffness and higher masses than the corresponding elderly groups. Females exhibited higher subglottal pressures, smaller oscillation masses and larger stiffness than the corresponding similar aged male groups. Optimizing numerical models towards vocal fold oscillations is useful to identify underlying laryngeal components controlling the phonatory process.

Dataset Information

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network.

Publications

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets