Dataset Information

Convalescing Cluster Configuration Using a Superlative Framework.

ABSTRACT: Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks.

SUBMITTER: Sabitha R

PROVIDER: S-EPMC4620246 | biostudies-other | 2015

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:Often, a community becomes alarmed when high rates of cancer are noticed, and residents suspect that the cancer cases could be caused by a known source of hazard. In response, the US Centers for Disease Control and Prevention recommend that departments of health perform a standardized incidence ratio (SIR) analysis to determine whether the observed cancer incidence is higher than expected. This approach has several limitations that are well documented in the existing literature. In this paper we propose a novel causal inference framework for cancer cluster investigations, rooted in the potential outcomes framework. Assuming that a source of hazard representing a potential cause of increased cancer rates in the community is identified a priori, we focus our approach on a causal inference estimand which we call the causal SIR (cSIR). The cSIR is a ratio defined as the expected cancer incidence in the exposed population divided by the expected cancer incidence for the same population under the (counterfactual) scenario of no exposure. To estimate the cSIR we need to overcome two main challenges: 1) identify unexposed populations that are as similar as possible to the exposed one to inform estimation of the expected cancer incidence under the counterfactual scenario of no exposure, and 2) publicly available data on cancer incidence for these unexposed populations are often available at a much higher level of spatial aggregation (e.g. county) than what is desired (e.g. census block group). We overcome the first challenge by relying on matching. We overcome the second challenge by building a Bayesian hierarchical model that borrows information from other sources to impute cancer incidence at the desired level of spatial aggregation. In simulations, our statistical approach was shown to provide dramatically improved results, i.e., less bias and better coverage, than the current approach to SIR analyses. We apply our proposed approach to investigate whether trichloroethylene vapor exposure has caused increased cancer incidence in Endicott, New York.

Project description:Constraint-based models use steady-state mass balances to define a solution space of flux configurations, which can be narrowed down by measuring as many fluxes as possible. Due to loops and redundant pathways, this process typically yields multiple alternative solutions. To address this ambiguity, flux sampling can estimate the probability distribution of each flux, or a flux configuration can be singled out by further minimizing the sum of fluxes according to the assumption that cellular metabolism favors states where enzyme-related costs are economized. However, flux sampling is susceptible to artifacts introduced by thermodynamically infeasible cycles and is it not clear if the economy of fluxes assumption (EFA) is universally valid. Here, we formulated a constraint-based approach, MaxEnt, based on the principle of maximum entropy, which in this context states that if more than one flux configuration is consistent with a set of experimentally measured fluxes, then the one with the minimum amount of unwarranted assumptions corresponds to the best estimation of the non-observed fluxes. We compared MaxEnt predictions to Escherichia coli and Saccharomyces cerevisiae publicly available flux data. We found that the mean square error (MSE) between experimental and predicted fluxes by MaxEnt and EFA-based methods are three orders of magnitude lower than the median of 1,350,000 MSE values obtained using flux sampling. However, only MaxEnt and flux sampling correctly predicted flux through E. coli's glyoxylate cycle, whereas EFA-based methods, in general, predict no flux cycles. We also tested MaxEnt predictions at increasing levels of overflow metabolism. We found that MaxEnt accuracy is not affected by overflow metabolism levels, whereas the EFA-based methods show a decreasing performance. These results suggest that MaxEnt is less sensitive than flux sampling to artifacts introduced by thermodynamically infeasible cycles and that its predictions are less susceptible to overfitting than EFA-based methods.

Project description:BackgroundDespite evidence of benefit for pharmacist involvement in chronic disease management, the provision of these services in community pharmacy has been suboptimal. The Promoting Action on Research Implementation in Health Services (PARiHS) framework suggests that for knowledge translation to be effective, there must be evidence of benefit, a context conducive to implementation, and facilitation to support uptake. We hypothesize that while the evidence and context components of this framework are satisfied, that uptake into practice has been insufficient because of a lack of facilitation. This protocol describes the rationale and methods of a feasibility study to test a facilitated pharmacy practice intervention based on the PARiHS framework, to assist community pharmacists in increasing the number of formal and documented medication management services completed for patients with diabetes, dyslipidemia, and hypertension.MethodsA cluster-randomized before-after design will compare ten pharmacies from within a single organization, with the unit of randomization being the pharmacy. Pharmacies will be randomized to facilitated intervention based on the PARiHS framework or usual practice. The Alberta Context Tool will be used to establish the context of practice in each pharmacy. Pharmacies randomized to the intervention will receive task-focused facilitation from an external facilitator, with the goal of developing alternative team processes to allow the greater provision of medication management services for patients with diabetes, hypertension, and dyslipidemia. The primary outcome will be a process evaluation of the needs of community pharmacies to provide more clinical services, the acceptability and uptake of modifications made, and the willingness of pharmacies to participate. Secondary outcomes will include the change in the number of formal and documented medication management services in the aforementioned chronic conditions provided 6 months before, versus after, the intervention between the two groups, and identification of feasible quantitative outcomes for evaluating the effect of the intervention on patient care outcomes.ResultsTo date, the study has identified and enrolled the ten pharmacies required and initiated the intervention process.ConclusionThis study will be the first to examine the role of facilitation in pharmacy practice, with the goal of scalable and sustainable practice change.Trial registrationClinicaltrials.gov identifier NCT02191111.

Dataset Information

Convalescing Cluster Configuration Using a Superlative Framework.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets