Dataset Information

Analysis of perceived similarity between pairs of microcalcification clusters in mammograms.

ABSTRACT: Content-based image retrieval aims to assist radiologists by presenting example images with known pathology that are visually similar to the case being evaluated. In this work, the authors investigate several fundamental issues underlying the similarity ratings between pairs of microcalcification (MC) lesions on mammograms as judged by radiologists: the degree of variability in the similarity ratings, the impact of this variability on agreement between readers in retrieval of similar lesions, and the factors contributing to the readers' similarity ratings.The authors conduct a reader study on a set of 1000 image pairs of MC lesions, in which a group of experienced breast radiologists rated the degree of similarity between each image pair. The image pairs are selected, from among possible pairings of 222 cases (110 malignant, 112 benign), based on quantitative image attributes (features) and the results of a preliminary reader study. Next, the authors apply analysis of variance (ANOVA) to quantify the level of variability in the readers' similarity ratings, and study how the variability in individual reader ratings affects consistency between readers. The authors also measure the extent to which readers agree on images which are most similar to a given query, for which the Dice coefficient is used. To investigate how the similarity ratings potentially relate to the attributes underlying the cases, the authors study the fraction of perceptually similar images that also share the same benign or malignant pathology as the query image; moreover, the authors apply multidimensional scaling (MDS) to embed the cases according to their mutual perceptual similarity in a two-dimensional plot, which allows the authors to examine the manner in which similar lesions relate to one another in terms of benign or malignant pathology and clustered MCs.The ANOVA results show that the coefficient of determination in the reader similarity ratings is 0.59. The variability level in the similarity ratings is proved to be a limiting factor, leading to only moderate correlation between the readers in their readings. The Dice coefficient, measuring agreement between readers in retrieval of similar images, can vary from 0.45 to 0.64 with different levels of similarity for individual readers, but is higher for average ratings from a group of readers (from 0.59 to 0.78). More importantly, the fraction of retrieved cases that match the benign or malignant pathology of the query image was found to increase with the degree of similarity among the retrieved images, reaching average value as high as 0.69 for the radiologists (p-value <10(-4) compared to random guessing). Moreover, MDS embedding of all the cases shows that cases having the same pathology tend to cluster together, and that neighboring cases in the plot tend to be similar in their clustered MCs.While individual readers exhibit substantial variability in their similarity ratings, similarity ratings averaged from a group of readers can achieve a high level of intergroup consistency and agreement in retrieval of similar images. More importantly, perceptually similar cases are also likely to be similar in their underlying benign or malignant pathology and image features of clustered MCs, which could be of diagnostic value in computer-aided diagnosis for lesions with clustered MCs.

SUBMITTER: Wang J

PROVIDER: S-EPMC4000405 | biostudies-other | 2014 May

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:BackgroundOne primary goal of transcriptomic studies is identifying gene expression patterns correlating with disease progression. This is usually achieved by considering transcripts that independently pass an arbitrary threshold (e.g. p<0.05). In diseases involving severe perturbations of multiple molecular systems, such as Alzheimer's disease (AD), this univariate approach often results in a large list of seemingly unrelated transcripts. We utilised a powerful multivariate clustering approach to identify clusters of RNA biomarkers strongly associated with markers of AD progression. We discuss the value of considering pairs of transcripts which, in contrast to individual transcripts, helps avoid natural human transcriptome variation that can overshadow disease-related changes.Methodology/principal findingsWe re-analysed a dataset of hippocampal transcript levels in nine controls and 22 patients with varying degrees of AD. A large-scale clustering approach determined groups of transcript probe sets that correlate strongly with measures of AD progression, including both clinical and neuropathological measures and quantifiers of the characteristic transcriptome shift from control to severe AD. This enabled identification of restricted groups of highly correlated probe sets from an initial list of 1,372 previously published by our group. We repeated this analysis on an expanded dataset that included all pair-wise combinations of the 1,372 probe sets. As clustering of this massive dataset is unfeasible using standard computational tools, we adapted and re-implemented a clustering algorithm that uses external memory algorithmic approach. This identified various pairs that strongly correlated with markers of AD progression and highlighted important biological pathways potentially involved in AD pathogenesis.Conclusions/significanceOur analyses demonstrate that, although there exists a relatively large molecular signature of AD progression, only a small number of transcripts recurrently cluster with different markers of AD progression. Furthermore, considering the relationship between two transcripts can highlight important biological relationships that are missed when considering either transcript in isolation.

Dataset Information

Analysis of perceived similarity between pairs of microcalcification clusters in mammograms.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets