Browse
Submit Data
Databases
API
Help

Dataset Information

20 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets.

ABSTRACT: Construction of knowledge repositories from web corpora by harvesting linguistic patterns is of benefit for many natural language-processing applications that rely on question-answering schemes. These methods require minimal or no human intervention and can recursively learn new relational facts-instances in a fully automated and scalable manner. This paper explores the performance of tolerance rough set-based learner with respect to two important issues: scalability and its effect on concept drift, by (1) designing a new version of the semi-supervised tolerance rough set-based pattern learner (TPL 2.0), (2) adapting a tolerance form of rough set methodology to categorize linguistic patterns, and (3) extracting categorical information from a large noisy dataset of crawled web pages. This work demonstrates that the TPL 2.0 learner is promising in terms of precision@30 metric when compared with three benchmark algorithms: Tolerant Pattern Learner 1.0, Fuzzy-Rough Set Pattern Learner, and Coupled Bayesian Sets-based learner.

SUBMITTER: Moghaddam HR

PROVIDER: S-EPMC7318947 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Textrous!: extracting semantic textual meaning from gene sets.

Project description:The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual 'tokens' from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.

| S-EPMC3639949 | biostudies-literature

Rough sets: past, present, and future.

Project description:Introduction of rough sets by Professor Zdzis?aw Pawlak has completed 35 years. The theory has already attracted the attention of many researchers and practitioners, who have contributed essentially to its development, from all over the world. The methods, developed based on rough set theory alone or in combination with other approaches, found applications in many areas. In this article, we outline some selected past and present research directions of rough sets. In particular, we emphasize the importance of searching strategies for relevant approximation spaces as the basic tools in achieving computational building blocks (granules or patterns) required for approximation of complex vague concepts. We also discuss new challenges related to problem solving by intelligent systems (IS) or complex adaptive systems (CAS). The concern is to control problems using interactive granular computing, an extension of the rough set approach, for effective realization of computations realized in IS or CAS. These challenges are important for the development of natural computing too.

| S-EPMC6244804 | biostudies-other

Rough sets for in silico identification of differentially expressed miRNAs.

Project description:The microRNAs, also known as miRNAs, are the class of small noncoding RNAs. They repress the expression of a gene posttranscriptionally. In effect, they regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and the utility of miRNAs for the diagnosis of cancer and other diseases. Unlike with mRNAs, a modest number of miRNAs might be sufficient to classify human cancers. However, the absence of a robust method to identify differentially expressed miRNAs makes this an open problem. In this regard, this paper presents a novel approach for in silico identification of differentially expressed miRNAs from microarray expression data sets. It integrates judiciously the theory of rough sets and merit of the so-called B.632+ bootstrap error estimate. While rough sets select relevant and significant miRNAs from expression data, the B.632+ error rate minimizes the variability and bias of the derived results. The effectiveness of the proposed approach, along with a comparison with other related approaches, is demonstrated on several miRNA microarray expression data sets, using the support vector machine.

| S-EPMC3790281 | biostudies-literature

PADI-web corpus: Labeled textual data in animal health domain.

Project description:Monitoring animal health worldwide, especially the early detection of outbreaks of emerging pathogens, is one of the means of preventing the introduction of infectious diseases in countries (Collier et al., 2008) [3]. In this context, we developed PADI-web, a Platform for Automated extraction of animal Disease Information from the Web (Arsevska et al., 2016, 2018). PADI-web is a text-mining tool that automatically detects, categorizes and extracts disease outbreak information from Web news articles. PADI-web currently monitors the Web for five emerging animal infectious diseases, i.e., African swine fever, avian influenza including highly pathogenic and low pathogenic avian influenza, foot-and-mouth disease, bluetongue, and Schmallenberg virus infection. PADI-web collects Web news articles in near-real time through RSS feeds. Currently, PADI-web collects disease information from Google News because of its international and multiple language coverage. We implemented machine learning techniques to identify the relevant disease information in texts (i.e., location and date of an outbreak, affected hosts, their numbers and clinical signs). In order to train the model for Information Extraction (IE) from news articles, a corpus in English has been manually labeled by domain experts. This labeled corpus (Rabatel et al., 2017) is presented in this data paper.

| S-EPMC6327737 | biostudies-literature

Uncertainty analysis of knowledge reductions in rough sets.

Project description:Uncertainty analysis is a vital issue in intelligent information processing, especially in the age of big data. Rough set theory has attracted much attention to this field since it was proposed. Relative reduction is an important problem of rough set theory. Different relative reductions have been investigated for preserving some specific classification abilities in various applications. This paper examines the uncertainty analysis of five different relative reductions in four aspects, that is, reducts' relationship, boundary region granularity, rules variance, and uncertainty measure according to a constructed decision table.

| S-EPMC4166434 | biostudies-other

Hydrophilic directional slippery rough surfaces for water harvesting.

Project description:Multifunctional surfaces that are favorable for both droplet nucleation and removal are highly desirable for water harvesting applications but are rare. Inspired by the unique functions of pitcher plants and rice leaves, we present a hydrophilic directional slippery rough surface (SRS) that is capable of rapidly nucleating and removing water droplets. Our surfaces consist of nanotextured directional microgrooves in which the nanotextures alone are infused with hydrophilic liquid lubricant. We have shown through molecular dynamics simulations that the physical origin of the efficient droplet nucleation is attributed to the hydrophilic surface functional groups, whereas the rapid droplet removal is due to the significantly reduced droplet pinning of the directional surface structures and slippery interface. We have further demonstrated that the SRS, owing to its large surface area, hydrophilic slippery interface, and directional liquid repellency, outperforms conventional liquid-repellent surfaces in water harvesting applications.

| S-EPMC5903897 | biostudies-other

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs.

Project description:We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.

| S-EPMC5038652 | biostudies-literature

Advancing Reverse Electrowetting-on-Dielectric from Planar to Rough Surface Electrodes for High Power Density Energy Harvesting.

Project description:Reverse electrowetting-on-dielectric (REWOD)-based energy harvesting has been studied over the last decade as a novel technique of harvesting energy by actuating liquid droplet(s) utilizing applied mechanical modulation. Much prior research in REWOD has relied on planar electrodes, which by its geometry possess a limited surface area. In addition, most of the prior REWOD works have applied a high bias voltage to enhance the output power that compromises the concept of self-powering wearable motion sensors in human health monitoring applications. In order to enhance the REWOD power density resulting from an increased electrode-electrolyte interfacial area, high surface area electrodes are required. Herein, electrical and multiphysics-based modeling approaches of REWOD energy harvester using structured rough surface electrodes are presented. By enhancing the overall available surface area, an increase in the overall capacitance is achieved. COMSOL and MATLAB-based models are also developed, and the empirical results are compared with the models to validate the performance. Root mean square (RMS) power density is calculated using the RMS voltage across an optimal load impedance. For the proposed rough electrode REWOD energy harvester, maximum power density of 3.18 μW cm-2 is achieved at 5 Hz frequency, which is ≈4 times higher than that of the planar electrodes.

| S-EPMC9285574 | biostudies-literature

Knowledge Beacons: Web services for data harvesting of distributed biomedical knowledge.

Project description:AvailabilityThe API and associated software is open source and currently available for access at https://github.com/NCATS-Tangerine/translator-knowledge-beacon.

| S-EPMC7987184 | biostudies-literature

Evaluation and Verification of the Global Rapid Identification of Threats System for Infectious Diseases in Textual Data Sources.

Project description:The Global Rapid Identification of Threats System (GRITS) is a biosurveillance application that enables infectious disease analysts to monitor nontraditional information sources (e.g., social media, online news outlets, ProMED-mail reports, and blogs) for infectious disease threats. GRITS analyzes these textual data sources by identifying, extracting, and succinctly visualizing epidemiologic information and suggests potentially associated infectious diseases. This manuscript evaluates and verifies the diagnoses that GRITS performs and discusses novel aspects of the software package. Via GRITS' web interface, infectious disease analysts can examine dynamic visualizations of GRITS' analyses and explore historical infectious disease emergence events. The GRITS API can be used to continuously analyze information feeds, and the API enables GRITS technology to be easily incorporated into other biosurveillance systems. GRITS is a flexible tool that can be modified to conduct sophisticated medical report triaging, expanded to include customized alert systems, and tailored to address other biosurveillance needs.

| S-EPMC5028852 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data