Project description:Automatic emotion recognition is one of the most challenging tasks. To detect emotion from nonstationary EEG signals, a sophisticated learning algorithm that can represent high-level abstraction is required. This study proposes the utilization of a deep learning network (DLN) to discover unknown feature correlation between input signals that is crucial for the learning task. The DLN is implemented with a stacked autoencoder (SAE) using hierarchical feature learning approach. Input features of the network are power spectral densities of 32-channel EEG signals from 32 subjects. To alleviate overfitting problem, principal component analysis (PCA) is applied to extract the most important components of initial input features. Furthermore, covariate shift adaptation of the principal components is implemented to minimize the nonstationary effect of EEG signals. Experimental results show that the DLN is capable of classifying three different levels of valence and arousal with accuracy of 49.52% and 46.03%, respectively. Principal component based covariate shift adaptation enhances the respective classification accuracy by 5.55% and 6.53%. Moreover, DLN provides better performance compared to SVM and naive Bayes classifiers.
Project description:Robust speech emotion recognition relies on the quality of the speech features. We present speech features enhancement strategy that improves speech emotion recognition. We used the INTERSPEECH 2010 challenge feature-set. We identified subsets from the features set and applied principle component analysis to the subsets. Finally, the features are fused horizontally. The resulting feature set is analyzed using t-distributed neighbour embeddings (t-SNE) before the application of features for emotion recognition. The method is compared with the state-of-the-art methods used in the literature. The empirical evidence is drawn using two well-known datasets: Berlin Emotional Speech Dataset (EMO-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) for two languages, German and English, respectively. Our method achieved an average recognition gain of 11.5% for six out of seven emotions for the EMO-DB dataset, and 13.8% for seven out of eight emotions for the RAVDESS dataset as compared to the baseline study.
Project description:The use of brain-computer interface (BCI) technology to identify emotional states has gained significant interest, especially with the rise of virtual reality (VR) applications. However, the extensive calibration required for precise emotion recognition models presents a significant challenge, particularly for sensitive groups such as children, elderly, and patients. This study presents a novel approach that utilizes heterogeneous adversarial transfer learning (HATL) to synthesize electroencephalography (EEG) data from various other signal modalities, reducing the need for lengthy calibration phases. We benchmark the efficacy of three generative adversarial network (GAN) architectures, such as conditional GAN (CGAN), conditional Wasserstein GAN (CWGAN), and CWGAN with gradient penalty (CWGAN-GP) within this framework. The proposed framework is rigorously tested on two conventional open sourced datasets, SEED-V and DEAP. Additionally, the framework was applied to an immersive three-dimensional (3D) dataset named GraffitiVR, which we collected to capture the emotional and behavioral reactions of individuals experiencing urban graffiti in a VR environment. This expanded application provides insights into emotion recognition frameworks in VR settings, providing a wider range of contexts for assessing our methodology. When the accuracy of emotion recognition classifiers trained with CWGAN-GP-generated EEG data combined with non-EEG sensory data was compared against those trained using a combination of real EEG and non-EEG sensory data, the accuracy ratios were 93% on the SEED-V dataset, 99% on the DEAP dataset, and 97% on the GraffitiVR dataset. Moreover, in the GraffitiVR dataset, using CWGAN-GP-generated EEG data with non-EEG sensory data for emotion recognition models resulted in up to a 30% reduction in calibration time compared to classifiers trained on real EEG data with non-EEG sensory data. These results underscore the robustness and versatility of the proposed approach, significantly enhancing emotion recognition processes across a variety of environmental settings.
Project description:Accurate recognition and understating of human emotions is an essential skill that can improve the collaboration between humans and machines. In this vein, electroencephalogram (EEG)-based emotion recognition is considered an active research field with challenging issues regarding the analyses of the nonstationary EEG signals and the extraction of salient features that can be used to achieve accurate emotion recognition. In this paper, an EEG-based emotion recognition approach with a novel time-frequency feature extraction technique is presented. In particular, a quadratic time-frequency distribution (QTFD) is employed to construct a high resolution time-frequency representation of the EEG signals and capture the spectral variations of the EEG signals over time. To reduce the dimensionality of the constructed QTFD-based representation, a set of 13 time- and frequency-domain features is extended to the joint time-frequency-domain and employed to quantify the QTFD-based time-frequency representation of the EEG signals. Moreover, to describe different emotion classes, we have utilized the 2D arousal-valence plane to develop four emotion labeling schemes of the EEG signals, such that each emotion labeling scheme defines a set of emotion classes. The extracted time-frequency features are used to construct a set of subject-specific support vector machine classifiers to classify the EEG signals of each subject into the different emotion classes that are defined using each of the four emotion labeling schemes. The performance of the proposed approach is evaluated using a publicly available EEG dataset, namely the DEAPdataset. Moreover, we design three performance evaluation analyses, namely the channel-based analysis, feature-based analysis and neutral class exclusion analysis, to quantify the effects of utilizing different groups of EEG channels that cover various regions in the brain, reducing the dimensionality of the extracted time-frequency features and excluding the EEG signals that correspond to the neutral class, on the capability of the proposed approach to discriminate between different emotion classes. The results reported in the current study demonstrate the efficacy of the proposed QTFD-based approach in recognizing different emotion classes. In particular, the average classification accuracies obtained in differentiating between the various emotion classes defined using each of the four emotion labeling schemes are within the range of 73.8 % ⁻ 86.2 % . Moreover, the emotion classification accuracies achieved by our proposed approach are higher than the results reported in several existing state-of-the-art EEG-based emotion recognition studies.
Project description:Emotion recognition from electroencephalogram (EEG) signals requires accurate and efficient signal processing and feature extraction. Deep learning technology has enabled the automatic extraction of raw EEG signal features that contribute to classifying emotions more accurately. Despite such advances, classification of emotions from EEG signals, especially recorded during recalling specific memories or imagining emotional situations has not yet been investigated. In addition, high-density EEG signal classification using deep neural networks faces challenges, such as high computational complexity, redundant channels, and low accuracy. To address these problems, we evaluate the effects of using a simple channel selection method for classifying self-induced emotions based on deep learning. The experiments demonstrate that selecting key channels based on signal statistics can reduce the computational complexity by 89% without decreasing the classification accuracy. The channel selection method with the highest accuracy was the kurtosis-based method, which achieved accuracies of 79.03% and 79.36% for the valence and arousal scales, respectively. The experimental results show that the proposed framework outperforms conventional methods, even though it uses fewer channels. Our proposed method can be beneficial for the effective use of EEG signals in practical applications.
Project description:Visual contents such as movies and animation evoke various human emotions. We examine an argument that the emotion from the visual contents may vary according to the contrast control of the scenes contained in the contents. We sample three emotions including positive, neutral and negative to prove our argument. We also sample several scenes of these emotions from visual contents and control the contrast of the scenes. We manipulate the contrast of the scenes and measure the change of valence and arousal from human participants who watch the contents using a deep emotion recognition module based on electroencephalography (EEG) signals. As a result, we conclude that the enhancement of contrast induces the increase of valence, while the reduction of contrast induces the decrease. Meanwhile, the contrast control affects arousal on a very minute scale.
Project description:BackgroundEmotion recognition using EEG signals enables clinicians to assess patients' emotional states with precision and immediacy. However, the complexity of EEG signal data poses challenges for traditional recognition methods. Deep learning techniques effectively capture the nuanced emotional cues within these signals by leveraging extensive data. Nonetheless, most deep learning techniques lack interpretability while maintaining accuracy.MethodsWe developed an interpretable end-to-end EEG emotion recognition framework rooted in the hybrid CNN and transformer architecture. Specifically, temporal convolution isolates salient information from EEG signals while filtering out potential high-frequency noise. Spatial convolution discerns the topological connections between channels. Subsequently, the transformer module processes the feature maps to integrate high-level spatiotemporal features, enabling the identification of the prevailing emotional state.ResultsExperiments' results demonstrated that our model excels in diverse emotion classification, achieving an accuracy of 74.23% ± 2.59% on the dimensional model (DEAP) and 67.17% ± 1.70% on the discrete model (SEED-V). These results surpass the performances of both CNN and LSTM-based counterparts. Through interpretive analysis, we ascertained that the beta and gamma bands in the EEG signals exert the most significant impact on emotion recognition performance. Notably, our model can independently tailor a Gaussian-like convolution kernel, effectively filtering high-frequency noise from the input EEG data.DiscussionGiven its robust performance and interpretative capabilities, our proposed framework is a promising tool for EEG-driven emotion brain-computer interface.
Project description:Emotions are a critical aspect of human behavior. One widely used technique for research in emotion measurement is based on the use of EEG signals. In general terms, the first step of signal processing is the elimination of noise, which can be done in manual or automatic terms. The next step is determining the feature vector using, for example, entropy calculation and its variations to generate a classification model. It is possible to use this approach to classify theoretical models such as the Circumplex model. This model proposes that emotions are distributed in a two-dimensional circular space. However, methods to determine the feature vector are highly susceptible to noise that may exist in the signal. In this article, a new method to adjust the classifier is proposed using metaheuristics based on the black hole algorithm. The method is aimed at obtaining results similar to those obtained with manual noise elimination methods. In order to evaluate the proposed method, the MAHNOB HCI Tagging Database was used. Results show that using the black hole algorithm to optimize the feature vector of the Support Vector Machine we obtained an accuracy of 92.56% over 30 executions.
Project description:In recent years, there has been a growing interest in the study of emotion recognition through electroencephalogram (EEG) signals. One particular group of interest are individuals with hearing impairments, who may have a bias towards certain types of information when communicating with those in their environment. To address this, our study collected EEG signals from both hearing-impaired and non-hearing-impaired subjects while they viewed pictures of emotional faces for emotion recognition. Four kinds of feature matrices, symmetry difference, and symmetry quotient based on original signal and differential entropy (DE) were constructed, respectively, to extract the spatial domain information. The multi-axis self-attention classification model was proposed, which consists of local attention and global attention, combining the attention model with convolution through a novel architectural element for feature classification. Three-classification (positive, neutral, negative) and five-classification (happy, neutral, sad, angry, fearful) tasks of emotion recognition were carried out. The experimental results show that the proposed method is superior to the original feature method, and the multi-feature fusion achieved a good effect in both hearing-impaired and non-hearing-impaired subjects. The average classification accuracy for hearing-impaired subjects and non-hearing-impaired subjects was 70.2% (three-classification) and 50.15% (five-classification), and 72.05% (three-classification) and 51.53% (five-classification), respectively. In addition, by exploring the brain topography of different emotions, we found that the discriminative brain regions of the hearing-impaired subjects were also distributed in the parietal lobe, unlike those of the non-hearing-impaired subjects.
Project description:Speech is a complex mechanism allowing us to communicate our needs, desires and thoughts. In some cases of neural dysfunctions, this ability is highly affected, which makes everyday life activities that require communication a challenge. This paper studies different parameters of an intelligent imaginary speech recognition system to obtain the best performance according to the developed method that can be applied to a low-cost system with limited resources. In developing the system, we used signals from the Kara One database containing recordings acquired for seven phonemes and four words. We used in the feature extraction stage a method based on covariance in the frequency domain that performed better compared to the other time-domain methods. Further, we observed the system performance when using different window lengths for the input signal (0.25 s, 0.5 s and 1 s) to highlight the importance of the short-term analysis of the signals for imaginary speech. The final goal being the development of a low-cost system, we studied several architectures of convolutional neural networks (CNN) and showed that a more complex architecture does not necessarily lead to better results. Our study was conducted on eight different subjects, and it is meant to be a subject's shared system. The best performance reported in this paper is up to 37% accuracy for all 11 different phonemes and words when using cross-covariance computed over the signal spectrum of a 0.25 s window and a CNN containing two convolutional layers with 64 and 128 filters connected to a dense layer with 64 neurons. The final system qualifies as a low-cost system using limited resources for decision-making and having a running time of 1.8 ms tested on an AMD Ryzen 7 4800HS CPU.