Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Explaining Sentiment Classification with Synthetic Exemplars and Counter-Exemplars

ABSTRACT: We present xspells, a model-agnostic local approach for explaining the decisions of a black box model for sentiment classification of short texts. The explanations provided consist of a set of exemplar sentences and a set of counter-exemplar sentences. The former are examples classified by the black box with the same label as the text to explain. The latter are examples classified with a different label (a form of counter-factuals). Both are close in meaning to the text to explain, and both are meaningful sentences – albeit they are synthetically generated. xspells generates neighbors of the text to explain in a latent space using Variational Autoencoders for encoding text and decoding latent instances. A decision tree is learned from randomly generated neighbors, and used to drive the selection of the exemplars and counter-exemplars. We report experiments on two datasets showing that xspells outperforms the well-known lime method in terms of quality of explanations, fidelity, and usefulness, and that is comparable to it in terms of stability.

SUBMITTER: Appice A

PROVIDER: S-EPMC7556386 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Improving Sentiment Classification Performance through Coaching Architectures.

Project description:Intelligent systems have been developed for years to solve specific tasks automatically. An important issue emerges when the information used by these systems exhibits a dynamic nature and evolves. This fact adds a level of complexity that makes these systems prone to a noticeable worsening of their performance. Thus, their capabilities have to be upgraded to address these new requirements. Furthermore, this problem is even more challenging when the information comes from human individuals and their interactions through language. This issue happens more easily and forcefully in the specific domain of Sentiment Analysis, where feelings and opinions of humans are in constant evolution. In this context, systems are trained with an enormous corpus of textual content, or they include an extensive set of words and their related sentiment values. These solutions are usually static and generic, making their manual upgrading almost unworkable. In this paper, an automatic and interactive coaching architecture is proposed. It includes a ML framework and a dictionary-based system both trained for a specific domain. These systems converse about the outcomes obtained during their respective learning stages by simulating human interactive coaching sessions. This leads to an Active Learning process where the dictionary-based system acquires new information and improves its performance. More than 800, 000 tweets have been gathered and processed for experiments. Outstanding results were obtained when the proposed architecture was used. Also, the lexicon was updated with the prior and new words related to the corpus used which is important to reach a better sentiment analysis classification.

| S-EPMC9043891 | biostudies-literature

A synthetic distributed genetic multi-bit counter.

Project description:A design for genetically encoded counters is proposed via repressor-based circuits. An N-bit counter reads sequences of input pulses and displays the total number of pulses, modulo 2 N . The design is based on distributed computation with specialized cell types allocated to specific tasks. This allows scalability and bypasses constraints on the maximal number of circuit genes per cell due to toxicity or failures due to resource limitations. The design starts with a single-bit counter. The N-bit counter is then obtained by interconnecting (using diffusible chemicals) a set of N single-bit counters and connector modules. An optimization framework is used to determine appropriate gate parameters and to compute bounds on admissible pulse widths and relaxation (inter-pulse) times, as well as to guide the construction of novel gates. This work can be viewed as a step toward obtaining circuits that are capable of finite automaton computation in analogy to digital central processing units.

| S-EPMC8666654 | biostudies-literature

Race for Second Place? Explaining East-West Differences in Anti-Muslim Sentiment in Germany.

Project description:It has been shown that anti-Muslim sentiment is more pronounced in East Germany than in West Germany. In this paper, we discuss existing explanations and add to them. We argue that some East Germans see themselves as a disadvantaged group in competition with other minorities, such as Muslims, for social recognition by West Germans; they are in what we call a "race for second place". Based on social identity theory, we expect that this might be particularly true for those who explicitly self-identify as East Germans. The theoretical discussion carves out the role of "perceived non-recognition" and "outgroup mobility threat" as important concepts within the conflicts of belonging. We use unique data from the survey "Postmigrant Societies: East-Migrant Analogies" for a comprehensive empirical analysis. We find that factors related to pre-existing arguments - such as socioeconomic and demographic variables, personality traits, or contact - can capture much of the group differences in anti-Muslim sentiment, but that they do not fully apply to those who were born and still live in the East and who explicitly self-identify as East Germans. For this subgroup, perceived non-recognition adds to the empirical models and outgroup mobility threat has a stronger effect.

| S-EPMC8632242 | biostudies-literature

Adaptive sentiment analysis using multioutput classification: a performance comparison.

Project description:The primary objective of this research is to create a multi-output classification model for sentiment analysis through the combination of 10 algorithms: BernoulliNB, Decision Tree, K-nearest neighbor, Logistic Regression, LinearSVC, Bagging, Stacking, Random Forest, AdaBoost, and ExtraTrees. In doing so, we aim to identify the optimal algorithm performance and role within the model. The data utilized in this study is derived from customer reviews of cryptocurrencies in Indonesia. Our results indicate that LinearSVC and Stacking exhibit a high accuracy (90%) compared to the other eight algorithms. The resulting multi-output model demonstrates an average accuracy of 88%, which can be considered satisfactory. This research endeavors to innovate in adaptive sentiment analysis classification by developing a multi-output model that utilizes a combination of 10 classification algorithms.

| S-EPMC10280487 | biostudies-literature

Advancing sentiment classification through a population game model approach.

Project description:Computational Sentiment Analysis involves the automation of human emotion comprehension by categorizing sentiments as positive, negative, or neutral. In the contemporary digital environment, the extensive volume of social media content presents significant challenges for manual analysis, thereby necessitating the development and implementation of automated analytical tools. To address the limitations of existing techniques, which heavily rely on machine learning and extensive dataset pre-training, we propose an innovative unsupervised approach for sentiment classification. This novel methodology is grounded in game theory concepts, particularly the population game model, offering a promising solution by circumventing the need for extensive training procedures. We extract two textual features from review comments, namely context score and emotion score. Leveraging lexicon databases and numeric scores, this cognitive mathematical framework is language-independent. Competitive results are demonstrated across various domains (hotels, restaurants, electronic devices, etc.), and the efficacy of the proposed work is validated in two languages (English and Hindi). The highest accuracy recorded for the English domain dataset is 89%, while electronic Hindi reviews attain an 84% accuracy rate. The proposed model exhibits domain and language independence, validated through statistical analyses confirming the significance of the findings. The framework demonstrates noteworthy rationality and coherence in its outcomes.

| S-EPMC11375044 | biostudies-literature

Lexicon-enhanced sentiment analysis framework using rule-based classification scheme.

Project description:With the rapid increase in social networks and blogs, the social media services are increasingly being used by online communities to share their views and experiences about a particular product, policy and event. Due to economic importance of these reviews, there is growing trend of writing user reviews to promote a product. Nowadays, users prefer online blogs and review sites to purchase products. Therefore, user reviews are considered as an important source of information in Sentiment Analysis (SA) applications for decision making. In this work, we exploit the wealth of user reviews, available through the online forums, to analyze the semantic orientation of words by categorizing them into +ive and -ive classes to identify and classify emoticons, modifiers, general-purpose and domain-specific words expressed in the public's feedback about the products. However, the un-supervised learning approach employed in previous studies is becoming less efficient due to data sparseness, low accuracy due to non-consideration of emoticons, modifiers, and presence of domain specific words, as they may result in inaccurate classification of users' reviews. Lexicon-enhanced sentiment analysis based on Rule-based classification scheme is an alternative approach for improving sentiment classification of users' reviews in online communities. In addition to the sentiment terms used in general purpose sentiment analysis, we integrate emoticons, modifiers and domain specific terms to analyze the reviews posted in online communities. To test the effectiveness of the proposed method, we considered users reviews in three domains. The results obtained from different experiments demonstrate that the proposed method overcomes limitations of previous methods and the performance of the sentiment analysis is improved after considering emoticons, modifiers, negations, and domain specific terms when compared to baseline methods.

| S-EPMC5322980 | biostudies-literature

Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics.

Project description:Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it-to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points. Despite their theoretical capacity to implement complex, high-dimensional computations, we find that trained networks converge to highly interpretable, low-dimensional representations. In particular, the topological structure of the fixed points and corresponding linearized dynamics reveal an approximate line attractor within the RNN, which we can use to quantitatively understand how the RNN solves the sentiment analysis task. Finally, we find this mechanism present across RNN architectures (including LSTMs, GRUs, and vanilla RNNs) trained on multiple datasets, suggesting that our findings are not unique to a particular architecture or dataset. Overall, these results demonstrate that surprisingly universal and human interpretable computations can arise across a range of recurrent networks.

| S-EPMC7416638 | biostudies-literature

Domain adaptive learning for multi realm sentiment classification on big data

Project description: Not available

| S-EPMC10984522 | biostudies-literature

Explaining Deep Classification of Time-Series Data with Learned Prototypes.

Project description:The emergence of deep learning networks raises a need for explainable AI so that users and domain experts can be confident applying them to high-risk decisions. In this paper, we leverage data from the latent space induced by deep learning models to learn stereotypical representations or "prototypes" during training to elucidate the algorithmic decision-making process. We study how leveraging prototypes effect classification decisions of two dimensional time-series data in a few different settings: (1) electrocardiogram (ECG) waveforms to detect clinical bradycardia, a slowing of heart rate, in preterm infants, (2) respiration waveforms to detect apnea of prematurity, and (3) audio waveforms to classify spoken digits. We improve upon existing models by optimizing for increased prototype diversity and robustness, visualize how these prototypes in the latent space are used by the model to distinguish classes, and show that prototypes are capable of learning features on two dimensional time-series data to produce explainable insights during classification tasks. We show that the prototypes are capable of learning real-world features - bradycardia in ECG, apnea in respiration, and articulation in speech - as well as features within sub-classes. Our novel work leverages learned prototypical framework on two dimensional time-series data to produce explainable insights during classification tasks.

| S-EPMC8050893 | biostudies-literature

Intelligent topical sentiment analysis for the classification of e-learners and their topics of interest.

Project description:Every day, huge numbers of instant tweets (messages) are published on Twitter as it is one of the massive social media for e-learners interactions. The options regarding various interesting topics to be studied are discussed among the learners and teachers through the capture of ideal sources in Twitter. The common sentiment behavior towards these topics is received through the massive number of instant messages about them. In this paper, rather than using the opinion polarity of each message relevant to the topic, authors focus on sentence level opinion classification upon using the unsupervised algorithm named bigram item response theory (BIRT). It differs from the traditional classification and document level classification algorithm. The investigation illustrated in this paper is of threefold which are listed as follows: (1) lexicon based sentiment polarity of tweet messages; (2) the bigram cooccurrence relationship using naïve Bayesian; (3) the bigram item response theory (BIRT) on various topics. It has been proposed that a model using item response theory is constructed for topical classification inference. The performance has been improved remarkably using this bigram item response theory when compared with other supervised algorithms. The experiment has been conducted on a real life dataset containing different set of tweets and topics.

| S-EPMC4381865 | biostudies-other

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data