Dataset Information

Data segmentation based on the local intrinsic dimension.

ABSTRACT: One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discriminate regions with different local IDs and segment the points accordingly. Our approach is computationally efficient and can be proficiently used even on large data sets. We find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded versus unfolded configurations in a protein molecular dynamics trajectory, active versus non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. A simple topological feature, the local ID, is thus sufficient to achieve an unsupervised segmentation of high-dimensional data, complementary to the one given by clustering algorithms.

SUBMITTER: Allegra M

PROVIDER: S-EPMC7536196 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Data segmentation based on the local intrinsic dimension.

Allegra Michele M Facco Elena E Denti Francesco F Laio Alessandro A Mira Antonietta A

Scientific reports 20201005 1

One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust app ...[more]

PMID: 33020515

Similar Datasets

Project description:BackgroundImage segmentation is the process of partitioning an image into separate objects or regions. It is an essential step in image processing to segment the regions of interest for further processing. We propose a method for segmenting the nuclei and cytoplasms from white blood cells (WBCs).MethodsInitially, the method computes an initial value based on the minimum and maximum values of the input image. Then, a histogram of the input image is computed and approximated to obtain function values. The method searches for the first local maximum and local minimum from the approximated function values in the order of increasing of knots sequence. We approximate the required threshold from the first local minimum and the computed initial value based on defined conditions. The threshold is applied to the input image to binarize it, and then post-processing is performed to obtain the final segmented nucleus. We segment the whole WBC before segmenting the cytoplasm depending on the complexity of the objects in the image. For WBCs that are well separated from red blood cells (RBCs), n thresholds are generated and then produce n thresholded images. Then, a standard Otsu method is used to binarize the average of the produced images. Morphological operations are applied on the binarized image, and then a single-pixel point from the segmented nucleus is used to segment the WBC. For images in which RBCs touch the WBCs, we segment the whole WBC using SLIC and watershed methods. The cytoplasm is obtained by subtracting the segmented nucleus from the segmented WBC.ResultsThe method is tested on two different public data sets and the results are compared to the state of art methods. The performance analysis shows that the proposed method segments the nucleus and cytoplasm well.ConclusionWe propose a method for nucleus and cytoplasm segmentation based on the local minima of the approximated function values from the image's histogram. The method has demonstrated its utility in segmenting nuclei, WBCs, and cytoplasm, and the results are satisfactory.

Project description:The thalamus is an essential relay station in the cortical-subcortical connections. It is characterized by a complex anatomical architecture composed of numerous small nuclei, which mediate the involvement of the thalamus in a wide range of neurological functions. We present a novel framework for segmenting the thalamic nuclei, which explores the orientation distribution functions (ODFs) from diffusion magnetic resonance images at 3 T. The differentiation of the complex intra-thalamic microstructure is improved by using the spherical harmonic (SH) representation of the ODFs, which provides full angular characterization of the diffusion process in each voxel. The clustering was performed using the k-means algorithm initialized in a data-driven manner. The method was tested on 35 healthy volunteers and our results show a robust, reproducible and accurate segmentation of the thalamus in seven nuclei groups. Six of them closely matched the anatomy and were labeled as anterior, ventral anterior, medio-dorsal, ventral latero-ventral, ventral latero-dorsal and pulvinar, while the seventh cluster included the centro-lateral and the latero-posterior nuclei. Results were evaluated both qualitatively, by comparing the segmented nuclei to the histological atlas of Morel, and quantitatively, by measuring the clusters' extent and the clusters' spatial distribution across subjects and hemispheres. We also showed the robustness of our approach across different sequences and scanners, as well as intra-subject reproducibility of the segmented clusters using additional two scan-rescan datasets. We also observed an overlap between the path of the main long-connection tracts passing through the thalamus and the spatial distribution of the nuclei identified with our clustering algorithm. Our approach, based on SH representations of the ODFs, outperforms the one based on angular differences between the principle diffusion directions, which is considered so far as state-of-the-art method. Our findings show an anatomically reliable segmentation of the main groups of thalamic nuclei that could be of potential use in many clinical applications.

Dataset Information

Data segmentation based on the local intrinsic dimension.

Publications

Data segmentation based on the local intrinsic dimension.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets