Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov-Nagumo Average.

ABSTRACT: Clustering is a major unsupervised learning algorithm and is widely applied in data mining and statistical data analyses. Typical examples include k-means, fuzzy c-means, and Gaussian mixture models, which are categorized into hard, soft, and model-based clusterings, respectively. We propose a new clustering, called Pareto clustering, based on the Kolmogorov-Nagumo average, which is defined by a survival function of the Pareto distribution. The proposed algorithm incorporates all the aforementioned clusterings plus maximum-entropy clustering. We introduce a probabilistic framework for the proposed method, in which the underlying distribution to give consistency is discussed. We build the minorize-maximization algorithm to estimate the parameters in Pareto clustering. We compare the performance with existing methods in simulation studies and in benchmark dataset analyses to demonstrate its highly practical utilities.

SUBMITTER: Komori O

PROVIDER: S-EPMC8145026 | biostudies-literature | 2021 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Publications

A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov-Nagumo Average.

Komori Osamu O Eguchi Shinto S

Entropy (Basel, Switzerland) 20210424 5

Clustering is a major unsupervised learning algorithm and is widely applied in data mining and statistical data analyses. Typical examples include k-means, fuzzy c-means, and Gaussian mixture models, which are categorized into hard, soft, and model-based clusterings, respectively. We propose a new clustering, called Pareto clustering, based on the Kolmogorov-Nagumo average, which is defined by a survival function of the Pareto distribution. The proposed algorithm incorporates all the aforementio ...[more]

PMID: 33923177

Similar Datasets

Differential privacy fuzzy C-means clustering algorithm based on gaussian kernel function.

Project description:Fuzzy C-means clustering algorithm is one of the typical clustering algorithms in data mining applications. However, due to the sensitive information in the dataset, there is a risk of user privacy being leaked during the clustering process. The fuzzy C-means clustering of differential privacy protection can protect the user's individual privacy while mining data rules, however, the decline in availability caused by data disturbances is a common problem of these algorithms. Aiming at the problem that the algorithm accuracy is reduced by randomly initializing the membership matrix of fuzzy C-means, in this paper, the maximum distance method is firstly used to determine the initial center point. Then, the gaussian value of the cluster center point is used to calculate the privacy budget allocation ratio. Additionally, Laplace noise is added to complete differential privacy protection. The experimental results demonstrate that the clustering accuracy and effectiveness of the proposed algorithm are higher than baselines under the same privacy protection intensity.

| S-EPMC7987176 | biostudies-literature

Outdoor THz fading modeling by means of gaussian and gamma mixture distributions.

Project description:Terahertz (THz) band offers a vast amount of bandwidth and is envisioned to become a key enabler for a number of next generation wireless applications. In this direction, appropriate channel models, encapsulating the large and small-scale fading phenomena, need to be developed for both indoor and outdoor communications environments. The THz large-scale fading characteristics have been extensively investigated for both indoor and outdoor scenarios. The study of indoor THz small-scale fading has recently gained the momentum, while the small-scale fading of outdoor THz wireless channels has not yet been investigated. Motivated by this, this contribution introduces Gaussian mixture (GM) distribution as a suitable small-scale fading model for outdoor THz wireless links. In more detail, multiple outdoor THz wireless measurements recorded at different transceiver separation distance are fed to an expectation-maximization fitting algorithm, which returns the parameters of the GM probability density function. The fitting accuracy of the analytical GMs is evaluated in terms of the Kolmogorov-Smirnov, Kullback-Leibler (KL) and root-mean-square-error (RMSE) tests. The results reveal that as the number of mixtures increases the resulting analytical GMs perform a better fit to the empirical distributions. In addition, the KL and RMSE metrics indicate that the increase of mixtures beyond a particular number result to no significant improvement of the fitting accuracy. Finally, following the same approach as in the case of GM, we examine the suitability of mixture Gamma to capture the small-scale fading characteristics of the outdoor THz channels.

| S-EPMC10115853 | biostudies-literature

A Fast Incremental Gaussian Mixture Model.

Project description:This work builds upon previous efforts in online incremental learning, namely the Incremental Gaussian Mixture Network (IGMN). The IGMN is capable of learning from data streams in a single-pass by improving its model after analyzing each data point and discarding it thereafter. Nevertheless, it suffers from the scalability point-of-view, due to its asymptotic time complexity of O(NKD3) for N data points, K Gaussian components and D dimensions, rendering it inadequate for high-dimensional data. In this work, we manage to reduce this complexity to O(NKD2) by deriving formulas for working directly with precision matrices instead of covariance matrices. The final result is a much faster and scalable algorithm which can be applied to high dimensional tasks. This is confirmed by applying the modified algorithm to high-dimensional classification datasets.

| S-EPMC4596621 | biostudies-literature

Clustering gene expression time series data using an infinite Gaussian process mixture model

Project description:In order to identify and characterize novel human gene expression responses to glucocorticoids, we exposed the human lung adenocarcinoma cell line, A549, to the synthetic glucocorticoid dexamethasone for 1, 3, 5, 7, 9, and 11 hrs in duration as well as to a paired vehicle control, ethanol. We assayed gene expression with RNA-seq and clustered gene expression profiles using an infinite Gaussian process mixture model.

2017-10-08 | GSE104714 | GEO

Unsupervised assessment of microarray data quality using a Gaussian mixture model.

Project description:BackgroundQuality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny.ResultsWe show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach.ConclusionThis research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations.

| S-EPMC2717951 | biostudies-literature

Detecting somatic mutations in genomic sequences by means of Kolmogorov-Arnold analysis.

Project description:The Kolmogorov-Arnold stochasticity parameter technique is applied for the first time to the study of cancer genome sequencing, to reveal mutations. Using data generated by next-generation sequencing technologies, we have analysed the exome sequences of brain tumour patients with matched tumour and normal blood. We show that mutations contained in sequencing data can be revealed using this technique, thus providing a new methodology for determining subsequences of given length containing mutations, i.e. its value differs from those of subsequences without mutations. A potential application for this technique involves simplifying the procedure of finding segments with mutations, speeding up genomic research and accelerating its implementation in clinical diagnostics. Moreover, the prediction of a mutation associated with a family of frequent mutations in numerous types of cancers based purely on the value of the Kolmogorov function indicates that this applied marker may recognize genomic sequences that are in extremely low abundance and can be used in revealing new types of mutations.

| S-EPMC4555851 | biostudies-literature

Real-Time Occlusion-Robust Deformable Linear Object Tracking With Model-Based Gaussian Mixture Model.

Project description:Tracking and manipulating deformable linear objects (DLOs) has great potential in the industrial world. However, estimating the object's state is crucial and challenging, especially when dealing with heavy occlusion situations and physical properties of different objects. To address these problems, we introduce a novel tracking algorithm to observe and estimate the states of DLO. The proposed tracking algorithm is based on the Coherent Point Drift (CPD), which registers the observed point cloud, and the finite element method (FEM) model encodes physical properties. The Gaussian mixture model with CPD regularization generates constraints to deform a given FEM model into desired shapes. The FEM model encodes the local structure, the global topology, and the material property to better approximate the deformation process in the real world without using simulation software. A series of simulations and real data tracking experiments have been conducted on deformable objects, such as rope and iron wire, to demonstrate the robustness and accuracy of our method in the presence of occlusion.

| S-EPMC9136076 | biostudies-literature

Heterogeneity in pulmonary emphysema: Analysis of CT attenuation using Gaussian mixture model.

Project description:PurposeTo utilize Gaussian mixture model (GMM) for the quantification of chronic obstructive pulmonary disease (COPD) and to evaluate the combined use of multiple types of quantification.Materials and methodsEighty-seven patients (67 men, 20 women; age, 67.4 ± 11.0 years) who had undergone computed tomography (CT) and pulmonary function test (PFT) were included. The heterogeneity of CT attenuation in emphysema (HC) was obtained by analyzing a distribution of CT attenuation with GMM. The percentages of low-attenuation volume in the lungs (LAV), wall area of bronchi (WA), and the cross-sectional area of small pulmonary vessels (CSA) were also calculated. The relationships between COPD quantifications and the PFT results were evaluated by Pearson's correlation coefficients and through linear models, with the best models selected using Akaike information criterion (AIC).ResultsThe correlation coefficients with FEV1 were as follows: LAV, -0.505; HC, -0.277; CSA, 0.384; WA, -0.196. The correlation coefficients with FEV1/FVC were: LAV, -0.640; HC, -0.136; CSA, 0.288; WA, -0.131. For predicting FEV1, the smallest AIC values were obtained in the model with LAV, HC, CSA, and WA. For predicting FEV1/FVC, the smallest AIC values were obtained in the model with LAV and HC. In both models, the coefficient of HC was statistically significant (P-values = 0.000880 and 0.0441 for FEV1 and FEV1/FVC, respectively).ConclusionGMM was applied to COPD quantification. The results of this study show that COPD severity was associated with HC. In addition, it is shown that the combined use of multiple types of quantification made the evaluation of COPD severity more reliable.

| S-EPMC5812649 | biostudies-literature

Spike sorting with Gaussian mixture models.

Project description:The shape of extracellularly recorded action potentials is a product of several variables, such as the biophysical and anatomical properties of the neuron and the relative position of the electrode. This allows isolating spikes of different neurons recorded in the same channel into clusters based on waveform features. However, correctly classifying spike waveforms into their underlying neuronal sources remains a challenge. This process, called spike sorting, typically consists of two steps: (1) extracting relevant waveform features (e.g., height, width), and (2) clustering them into non-overlapping groups believed to correspond to different neurons. In this study, we explored the performance of Gaussian mixture models (GMMs) in these two steps. We extracted relevant features using a combination of common techniques (e.g., principal components, wavelets) and GMM fitting parameters (e.g., Gaussian distances). Then, we developed an approach to perform unsupervised clustering using GMMs, estimating cluster properties in a data-driven way. We found the proposed GMM-based framework outperforms previously established methods in simulated and real extracellular recordings. We also discuss potentially better techniques for feature extraction than the widely used principal components. Finally, we provide a friendly graphical user interface to run our algorithm, which allows manual adjustments.

| S-EPMC6403234 | biostudies-literature

Global vision object detection using an improved Gaussian Mixture model based on contour.

Project description:Object detection plays an important role in the field of computer vision. The purpose of object detection is to identify the objects of interest in the image and determine their categories and positions. Object detection has many important applications in various fields. This article addresses the problems of unclear foreground contour in moving object detection and excessive noise points in the global vision, proposing an improved Gaussian mixture model for feature fusion. First, the RGB image was converted into the HSV space, and a mixed Gaussian background model was established. Next, the object area was obtained through background subtraction, residual interference in the foreground was removed using the median filtering method, and morphological processing was performed. Then, an improved Canny algorithm using an automatic threshold from the Otsu method was used to extract the overall object contour. Finally, feature fusion of edge contours and the foreground area was performed to obtain the final object contour. The experimental results show that this method improves the accuracy of the object contour and reduces noise in the object.

| S-EPMC10803047 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data