Dataset Information

Boolean logic algebra driven similarity measure for text based applications.

ABSTRACT: In Information Retrieval (IR), Data Mining (DM), and Machine Learning (ML), similarity measures have been widely used for text clustering and classification. The similarity measure is the cornerstone upon which the performance of most DM and ML algorithms is completely dependent. Thus, till now, the endeavor in literature for an effective and efficient similarity measure is still immature. Some recently-proposed similarity measures were effective, but have a complex design and suffer from inefficiencies. This work, therefore, develops an effective and efficient similarity measure of a simplistic design for text-based applications. The measure developed in this work is driven by Boolean logic algebra basics (BLAB-SM), which aims at effectively reaching the desired accuracy at the fastest run time as compared to the recently developed state-of-the-art measures. Using the term frequency-inverse document frequency (TF-IDF) schema, the K-nearest neighbor (KNN), and the K-means clustering algorithm, a comprehensive evaluation is presented. The evaluation has been experimentally performed for BLAB-SM against seven similarity measures on two most-popular datasets, Reuters-21 and Web-KB. The experimental results illustrate that BLAB-SM is not only more efficient but also significantly more effective than state-of-the-art similarity measures on both classification and clustering tasks.

SUBMITTER: Abdalla HI

PROVIDER: S-EPMC8330432 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Boolean logic algebra driven similarity measure for text based applications.

Abdalla Hassan I HI Amer Ali A AA

PeerJ. Computer science 20210729

In Information Retrieval (IR), Data Mining (DM), and Machine Learning (ML), similarity measures have been widely used for text clustering and classification. The similarity measure is the cornerstone upon which the performance of most DM and ML algorithms is completely dependent. Thus, till now, the endeavor in literature for an effective and efficient similarity measure is still immature. Some recently-proposed similarity measures were effective, but have a complex design and suffer from ineffi ...[more]

PMID: 34401474

Similar Datasets

Project description:BackgroundA key problem in the analysis of mathematical models of molecular networks is the determination of their steady states. The present paper addresses this problem for Boolean network models, an increasingly popular modeling paradigm for networks lacking detailed kinetic information. For small models, the problem can be solved by exhaustive enumeration of all state transitions. But for larger models this is not feasible, since the size of the phase space grows exponentially with the dimension of the network. The dimension of published models is growing to over 100, so that efficient methods for steady state determination are essential. Several methods have been proposed for large networks, some of them heuristic. While these methods represent a substantial improvement in scalability over exhaustive enumeration, the problem for large networks is still unsolved in general.ResultsThis paper presents an algorithm that consists of two main parts. The first is a graph theoretic reduction of the wiring diagram of the network, while preserving all information about steady states. The second part formulates the determination of all steady states of a Boolean network as a problem of finding all solutions to a system of polynomial equations over the finite number system with two elements. This problem can be solved with existing computer algebra software. This algorithm compares favorably with several existing algorithms for steady state determination. One advantage is that it is not heuristic or reliant on sampling, but rather determines algorithmically and exactly all steady states of a Boolean network. The code for the algorithm, as well as the test suite of benchmark networks, is available upon request from the corresponding author.ConclusionsThe algorithm presented in this paper reliably determines all steady states of sparse Boolean networks with up to 1000 nodes. The algorithm is effective at analyzing virtually all published models even those of moderate connectivity. The problem for large Boolean networks with high average connectivity remains an open problem.

Project description:Biological systems contain a large number of molecules that have diverse interactions. A fruitful path to understanding these systems is to represent them with interaction networks, and then describe flow processes in the network with a dynamic model. Boolean modeling, the simplest discrete dynamic modeling framework for biological networks, has proven its value in recapitulating experimental results and making predictions. A first step and major roadblock to the widespread use of Boolean networks in biology is the laborious network inference and construction process. Here we present a streamlined network inference method that combines the discovery of a parsimonious network structure and the identification of Boolean functions that determine the dynamics of the system. This inference method is based on a causal logic analysis method that associates a logic type (sufficient or necessary) to node-pair relationships (whether promoting or inhibitory). We use the causal logic framework to assimilate indirect information obtained from perturbation experiments and infer relationships that have not yet been documented experimentally. We apply this inference method to a well-studied process of hormone signaling in plants, the signaling underlying abscisic acid (ABA)-induced stomatal closure. Applying the causal logic inference method significantly reduces the manual work typically required for network and Boolean model construction. The inferred model agrees with the manually curated model. We also test this method by re-inferring a network representing epithelial to mesenchymal transition based on a subset of the information that was initially used to construct the model. We find that the inference method performs well for various likely scenarios of inference input information. We conclude that our method is an effective approach toward inference of biological networks and can become an efficient step in the iterative process between experiments and computations.

Project description:BackgroundMore so than face-to-face counseling, users of online text-based services might drop out from a session before establishing a clear closure or expressing the intention to leave. Such premature departure may be indicative of heightened risk or dissatisfaction with the service or counselor. However, there is no systematic way to identify this understudied phenomenon.PurposeThis study has two objectives. First, we developed a set of rules and used logic-based pattern matching techniques to systematically identify premature departures in an online text-based counseling service. Second, we validated the importance of premature departure by examining its association with user satisfaction. We hypothesized that the users who rated the session as less helpful were more likely to have departed prematurely.MethodWe developed and tested a classification model using a sample of 575 human-annotated sessions from an online text-based counseling platform. We used 80% of the dataset to train and develop the model and 20% of the dataset to evaluate the model performance. We further applied the model to the full dataset (34,821 sessions). We compared user satisfaction between premature departure and completed sessions based on data from a post-session survey.ResultsThe resulting model achieved 97% and 92% F1 score in detecting premature departure cases in the training and test sets, respectively, suggesting it is highly consistent with the judgment of human coders. When applied to the full dataset, the model classified 15,150 (43.5%) sessions as premature departure and the remaining 19,671 (56.5%) as completed sessions. Completed cases (15.2%) were more likely to fill the post-chat survey than premature departure cases (4.0%). Premature departure was significantly associated with lower perceived helpfulness and effectiveness in distress reduction.ConclusionsThe model is the first that systematically and accurately identifies premature departure in online text-based counseling. It can be readily modified and transferred to other contexts for the purpose of risk mitigation and service evaluation and improvement.

Dataset Information

Boolean logic algebra driven similarity measure for text based applications.

Publications

Boolean logic algebra driven similarity measure for text based applications.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets