Dataset Information

Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis.

ABSTRACT: Statistical parametric speech synthesis based on Hidden Markov Models has been an important technique for the production of artificial voices, due to its ability to produce results with high intelligibility and sophisticated features such as voice conversion and accent modification with a small footprint, particularly for low-resource languages where deep learning-based techniques remain unexplored. Despite the progress, the quality of the results, mainly based on Hidden Markov Models (HMM) does not reach those of the predominant approaches, based on unit selection of speech segments of deep learning. One of the proposals to improve the quality of HMM-based speech has been incorporating postfiltering stages, which pretend to increase the quality while preserving the advantages of the process. In this paper, we present a new approach to postfiltering synthesized voices with the application of discriminative postfilters, with several long short-term memory (LSTM) deep neural networks. Our motivation stems from modeling specific mapping from synthesized to natural speech on those segments corresponding to voiced or unvoiced sounds, due to the different qualities of those sounds and how HMM-based voices can present distinct degradation on each one. The paper analyses the discriminative postfilters obtained using five voices, evaluated using three objective measures, Mel cepstral distance and subjective tests. The results indicate the advantages of the discriminative postilters in comparison with the HTS voice and the non-discriminative postfilters.

SUBMITTER: Coto-Jimenez M

PROVIDER: S-EPMC7985793 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis.

Coto-Jiménez Marvin M

Biomimetics (Basel, Switzerland) 20210207 1

Statistical parametric speech synthesis based on Hidden Markov Models has been an important technique for the production of artificial voices, due to its ability to produce results with high intelligibility and sophisticated features such as voice conversion and accent modification with a small footprint, particularly for low-resource languages where deep learning-based techniques remain unexplored. Despite the progress, the quality of the results, mainly based on Hidden Markov Models (HMM) does ...[more]

PMID: 33562420

Similar Datasets

Project description:IntroductionThis study aims to develop an imaging model based on multi-parametric MR images for distinguishing between prostate cancer (PCa) and prostate hyperplasia.MethodsA total of 236 subjects were enrolled and divided into training and test sets for model construction. Firstly, a multi-view radiomics modeling strategy was designed in which different combinations of radiomics feature categories (original, LoG, and wavelet) were compared to obtain the optimal input feature sets. Minimum-redundancy maximum-relevance (mRMR) selection and least absolute shrinkage selection operator (LASSO) were used for feature reduction, and the next logistic regression method was used for model construction. Then, a Swin Transformer architecture was designed and trained using transfer learning techniques to construct the deep learning models (DL). Finally, the constructed multi-view radiomics and DL models were combined and compared for model selection and nomogram construction. The prediction accuracy, consistency, and clinical benefit were comprehensively evaluated in the model comparison.ResultsThe optimal input feature set was found when LoG and wavelet features were combined, while 22 and 17 radiomic features in this set were selected to construct the ADC and T2 multi-view radiomic models, respectively. ADC and T2 DL models were built by transferring learning from a large number of natural images to a relatively small sample of prostate images. All individual and combined models showed good predictive accuracy, consistency, and clinical benefit. Compared with using only an ADC-based model, adding a T2-based model to the combined model would reduce the model's predictive performance. The ADCCombinedScore model showed the best predictive performance among all and was transformed into a nomogram for better use in clinics.DiscussionThe constructed models in our study can be used as a predictor in differentiating PCa and BPH, thus helping clinicians make better clinical treatment decisions and reducing unnecessary prostate biopsies.

Dataset Information

Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis.

Publications

Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets