Dataset Information

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.

ABSTRACT: Serbian is in a group of highly inflective and morphologically rich languages that use a lot of different word suffixes to express different grammatical, syntactic, or semantic features. This kind of behaviour usually produces a lot of recognition errors, especially in large vocabulary systems-even when, due to good acoustical matching, the correct lemma is predicted by the automatic speech recognition system, often a wrong word ending occurs, which is nevertheless counted as an error. This effect is larger for contexts not present in the language model training corpus. In this manuscript, an approach which takes into account different morphological categories of words for language modeling is examined, and the benefits in terms of word error rates and perplexities are presented. These categories include word type, word case, grammatical number, and gender, and they were all assigned to words in the system vocabulary, where applicable. These additional word features helped to produce significant improvements in relation to the baseline system, both for n-gram-based and neural network-based language models. The proposed system can help overcome a lot of tedious errors in a large vocabulary system, for example, for dictation, both for Serbian and for other languages with similar characteristics.

SUBMITTER: Pakoci E

PROVIDER: S-EPMC6421827 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.

Pakoci Edvin E Popović Branislav B Pekar Darko D

Computational intelligence and neuroscience 20190303

Serbian is in a group of highly inflective and morphologically rich languages that use a lot of different word suffixes to express different grammatical, syntactic, or semantic features. This kind of behaviour usually produces a lot of recognition errors, especially in large vocabulary systems-even when, due to good acoustical matching, the correct lemma is predicted by the automatic speech recognition system, often a wrong word ending occurs, which is nevertheless counted as an error. This effe ...[more]

PMID: 30944554

Similar Datasets

Project description:The speech emotion recognition system determines a speaker's emotional state by analyzing his/her speech audio signal. It is an essential at the same time a challenging task in human-computer interaction systems and is one of the most demanding areas of research using artificial intelligence and deep machine learning architectures. Despite being the world's seventh most widely spoken language, Bangla is still classified as one of the low-resource languages for speech emotion recognition tasks because of inadequate availability of data. There is an apparent lack of speech emotion recognition dataset to perform this type of research in Bangla language. This article presents a Bangla language-based emotional speech-audio recognition dataset to address this problem. BanglaSER is a Bangla language-based speech emotion recognition dataset. It consists of speech-audio data of 34 participating speakers from diverse age groups between 19 and 47 years, with a balanced 17 male and 17 female nonprofessional participating actors. This dataset contains 1467 Bangla speech-audio recordings of five rudimentary human emotional states, namely angry, happy, neutral, sad, and surprise. Three trials are conducted for each emotional state. Hence, the total number of recordings involves 3 statements × 3 repetitions × 4 emotional states (angry, happy, sad, and surprise) × 34 participating speakers = 1224 recordings + 3 statements × 3 repetitions × 1 emotional state (neutral) × 27 participating speakers = 243 recordings, resulting in a total number of recordings of 1467. BanglaSER dataset is created by recording speech-audios through smartphones, and laptops, having a balanced number of recordings in each category with evenly distributed participating male and female actors, and would serve as an essential training dataset for the Bangla speech emotion recognition model in terms of generalization. BanglaSER is compatible with various deep learning architectures such as Convolutional neural networks, Long short-term memory, Gated recurrent unit, Transformer, etc. The dataset is available at https://data.mendeley.com/datasets/t9h6p943xy/5 and can be used for research purposes.

Project description:This study investigated Chinese English-as-a-foreign-language (EFL) learners' use of vocabulary learning strategies (VLSs) and its relationship with vocabulary knowledge (VK), especially in relation to proficiency, gender, and discipline. Structural equation models were established following exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) procedures, and mediation analyses and multiple-group analyses, as well as analyses of variance, were conducted. Four hundred nineteen sophomores' strategy use frequency, Vocabulary Size Test (VST) scores (indicative of breadth of VK), Word Associates Test (WAT) scores (indicative of depth of VK), College English Test Band-4 scores, and gender and discipline categories were used as data. Proficiency significantly predicted Attention and Guessing positively but was a negative predictor of Socializing (asking others for help). Girls liked making notes while using dictionaries (DictNote) and Socializing, and students of arts also took more notes. Attention and Guessing significantly predicted VST and WAT positively, but Socializing significantly predicted the breadth and depth of VK negatively, and DictNote, Association, and Repetition had no significant relationship. The predictive power of Attention, Guessing, and Socializing, however, was achieved mainly, or for an important part, via the mediating or indirect effects of proficiency. Gender moderated the predictive power of Attention, Socializing, and DictNote over VST, greater for male students, whereas discipline moderated the relationship between Guessing and WAT, stronger for arts students. The findings are related to strategy features, gender characteristics, disciplinary influence, the EFL context and culture, and effective learning. This study reveals the complex relationships among use of VLSs, VK, and learner variables. Attention is called for to third-party factors in understanding VLSs-VK relationships. Given the important mediating effects of proficiency, it is proposed that vocabulary learning be strategically integrated into the accumulative process of English learning.

Dataset Information

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.

Publications

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets